Hi Community! We use Flyte with the Databricks (G...
# ask-the-community
r
Hi Community! We use Flyte with the Databricks (GO) plugin, running Dynamic workflows. In some cases, failed dynamic nodes would get stuck in propeller. Even though the nodes are marked as
FAILED
/
ABORTED
in the Flyte UI, propeller continues to poll the Databricks job status for these "stale" nodes until propeller restart. Are you aware of this issue? cc @anantharaman janakiraman @GF @Aarthi Vellingiri
k
Cc @Kevin Su this looks like a bug right
k
which version of propeller you are using
g
propeller version is v1.10.6
k
@GF / @Robert Ambrus it seems this was a bug that was fixed prior to 1.10.6 - https://github.com/flyteorg/flyte/commit/da77476ecbb8b183612405e9f3e3f7117d67d2a8 So i am confused.
k
k
@GF / @Robert Ambrus / @anantharaman janakiraman 1. the resource manager implementations are available here. 2. You can implement a Inmemory or Redis resource manager that uses secure connections - here 3. This is the logic of checking the current set quotas 4. This is where the quotas are configured 5. If you have redis configured this should work To create the configuration. 1. You will have to set the type of resource manager at propeller config level 2. Once you have that, for the databricks plugin - you will need to create the expected resource quotas. 3. ResourceQuotas is simply a map of
namespace: quota value (int)
4. ResourceNamespace is just a string 5. This explains how resource manager works 6. Default namespace is already configured 7. Currently it defaults to 100 project level and 50 per namespace I think you should be able to just simply set the defaults to a low value and it should work if you have redis configured. Let me figure out if this recommendation by @Haytham Abuelfutuh is indeed accurate or should we set the
defaults
Not sure if the resourcequota is used. cc @Samhita Alla and @Kevin Su this might be useful to test with Agent as well and then document
r
@Ketan (kumare3) Thanks for the detailed instructions, let us review.