GF
02/24/2024, 11:41 PMwebapi.ResourceQuotas
https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html#resourcequotas-webapi-resourcequotas? What happens when this limit is hit? Is this an enforcing limit if configured in FlytePropeller for a FlytePlugin?David Espejo (he/him)
02/27/2024, 4:59 PMDan Rammer (hamersaw)
02/27/2024, 6:20 PMKevin Su
02/27/2024, 6:55 PMGF
02/29/2024, 10:51 PMGF
02/29/2024, 10:53 PMKevin Su
03/25/2024, 5:26 PMKevin Su
03/25/2024, 5:26 PMHaytham Abuelfutuh
kubectl port-forward -n flyte deploy/flytepropeller 10254
Then go to the browser and visit: http://localhost:10254/config
You should find "resourceQuotas" with the values you specified. Mind sending a screenshot of that?Robert Ambrus
03/26/2024, 6:00 AMRobert Ambrus
03/26/2024, 12:44 PMdatabricks:
enabled: true
upload_entrypoint: true
plugin_config:
plugins:
databricks:
databricksInstance: <!--- set to our DBX instance --->
# this is the entrypoint.py for flyte on databricks
entrypointFile: <!--- set to our DBX entrypoint file location --->
# this file is mounted by vault agent injector at /vault/secrets
databricksTokenKey: <!--- set to our DBX token key --->
webApi:
caching:
maxSystemFailures: 5
resyncInterval: 60s #default value is 30s!
size: 500000
workers: 10
readRateLimiter:
burst: 20 #default value is 100!
qps: 10
resourceMeta: null
resourceQuotas:
default: 10 #default value is 1000!
writeRateLimiter:
burst: 20 #default value is 100!
qps: 10
Robert Ambrus
03/26/2024, 12:50 PMreadRateLimiter
settings are not applied
• Flyte is ignoring resourceQuotas / default
value - we set it to 10 and 25 Spark tasks are running simultaneously (that's the limit set by flyteadmin / maxParallelism)Robert Ambrus
03/26/2024, 12:56 PMRobert Ambrus
03/26/2024, 1:04 PMwebApi
config for Flyte Databricks plugin. We have come across the Flyte ResourceManager page, actually we have this Propeller config:
propeller:
resourcemanager:
type: noop
@Kevin Su Can you please clarify whether we need to setup a ResourceManager to apply settings in webApi
config?Robert Ambrus
03/26/2024, 1:24 PMRobert Ambrus
03/26/2024, 1:24 PManantharaman janakiraman
03/26/2024, 8:13 PManantharaman janakiraman
03/26/2024, 8:15 PMKetan (kumare3)
anantharaman janakiraman
03/26/2024, 11:42 PManantharaman janakiraman
03/26/2024, 11:43 PMKetan (kumare3)
anantharaman janakiraman
03/26/2024, 11:52 PMHaytham Abuelfutuh
propeller:
resourcemanager:
type: redis
redis:
hostPaths:
- <redis replica 1>...
hostKey: <password>
maxRetries: 3
Robert Ambrus
03/27/2024, 4:44 PMwebApi
conf?Robert Ambrus
03/27/2024, 4:56 PMHaytham Abuelfutuh
Robert Ambrus
03/27/2024, 5:03 PManantharaman janakiraman
03/27/2024, 5:05 PManantharaman janakiraman
03/27/2024, 5:05 PMHaytham Abuelfutuh
Ketan (kumare3)
anantharaman janakiraman
03/29/2024, 4:48 AMRobert Ambrus
04/02/2024, 3:54 PMRobert Ambrus
04/02/2024, 3:55 PMOBSERVATIONS
1. databricks / resourceQuotas is applied successfully
◦ 10 tasks in RUNNING state - launched (resourceQuotas)
◦ 15 tasks in RUNNING state - queued (max_parallelism - resourceQuotas)
◦ all the remaining tasks in UNKNOWN state
◦ 10 launched tasks succeeded
◦ 10 more tasks moved to RUNNING state - queued
◦ ❗ unfortunately, the workflow is stuck in this phase, it seems that when a task enters the queued phase, it cannot move to the launched phase anymore
2. ❗ databricks / webApi / readRateLimiter is not applied
◦ we still see in the logs that more than hundred requests (per sec) are sent to the downstream API, even though we set (QPS = 10, BURST = 20)
So it seems that the Redis - Flyte integration has been done successfully, but we still face functional issues.
Although both issues are important, the second one is critical. Can we focus on that one?
QUESTIONS
• Is the webApi / readRateLimiter
config supposed to be applied by Redis ResourceManager?
• Do we need any other configurations besides the ones we already shared?Robert Ambrus
04/02/2024, 3:59 PMresourcemanager:
resourceMaxQuota: 1000
redis:
hostKey: *****
hostPaths:
- *****
maxRetries: 3
type: redis
GF
04/02/2024, 9:13 PMRobert Ambrus
04/03/2024, 1:12 PMKetan (kumare3)
Robert Ambrus
04/03/2024, 2:59 PMKetan (kumare3)
Ketan (kumare3)
Robert Ambrus
04/03/2024, 3:02 PMKetan (kumare3)
Robert Ambrus
04/03/2024, 3:12 PMKetan (kumare3)
Robert Ambrus
04/03/2024, 3:36 PMwebApi / readRateLimiter
config? That's the most burning issue for us. It seems these configs are ignored for Databricks plugin. Is it supposed to be applied by Flyte ResourceManager?Robert Ambrus
04/03/2024, 4:08 PMwebApi / readRateLimiter
config. If you could clarify which component (e.g. flyte plugin, redis) is responsible for applying this config, that would be very helpful.Ketan (kumare3)
Ketan (kumare3)
GF
04/03/2024, 4:35 PManantharaman janakiraman
04/03/2024, 7:31 PManantharaman janakiraman
04/03/2024, 7:32 PMKetan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
anantharaman janakiraman
04/04/2024, 2:54 AManantharaman janakiraman
04/04/2024, 2:55 AMKetan (kumare3)
Ketan (kumare3)
anantharaman janakiraman
04/04/2024, 4:27 AManantharaman janakiraman
04/04/2024, 4:27 AManantharaman janakiraman
04/04/2024, 4:51 AMRobert Ambrus
04/04/2024, 12:48 PMRobert Ambrus
04/04/2024, 12:50 PManantharaman janakiraman
04/04/2024, 1:00 PMKetan (kumare3)
Haytham Abuelfutuh
Haytham Abuelfutuh
webApi:
caching:
maxSystemFailures: 5
resyncInterval: 60s #default value is 30s!
size: 500000
workers: 10
readRateLimiter:
burst: 20 #default value is 100!
qps: 10
resourceMeta: null
resourceQuotas:
default: 1 #default value is 1000!
writeRateLimiter:
burst: 20 #default value is 100!
qps: 10
This is my relevant config... I set default to 1
just to make sure I run out of quota quickly...
Trying to see if there is an issue with freeing up tokens that might cause this...
In the meantime, do you mind enabling INFO logs on propeller and looking for the following lines:
Start building a resource manager
to the Redis Qubole set
Too many allocations
@anantharaman janakiraman @Robert Ambrusanantharaman janakiraman
04/04/2024, 5:39 PManantharaman janakiraman
04/04/2024, 5:41 PManantharaman janakiraman
04/04/2024, 5:42 PMHaytham Abuelfutuh
resourceQuotas:
default: 1 #default value is 1000!
I didn't set any resource constraintsanantharaman janakiraman
04/04/2024, 6:38 PManantharaman janakiraman
04/04/2024, 6:39 PManantharaman janakiraman
04/04/2024, 6:39 PManantharaman janakiraman
04/04/2024, 7:08 PMHaytham Abuelfutuh
anantharaman janakiraman
04/04/2024, 7:39 PMGF
04/04/2024, 10:58 PMGF
04/04/2024, 11:16 PM{"json":{"routine":"databricks-worker-1","src":"plugin.go:164"},"level":"debug","msg":"Get databricks job response%!(EXTRA string=resp, *http.Response=\u0026{429 Too Many Requests 429 HTTP/2.0 2 0 map[Date:[Wed, 03 Apr 2024 07:37:26 GMT] Retry-After:[1] Server:[databricks] ... [Maximum rate of 100 requests per SECOND has been exceeded. Please reduce the rate of requests and try again after 1 second(s)]] {} 0 [] false false map[] ...})","ts":"2024-04-03T07:37:26Z"}
anantharaman janakiraman
04/04/2024, 11:49 PManantharaman janakiraman
04/04/2024, 11:51 PMRobert Ambrus
04/05/2024, 1:35 PMautoRefreshCache, err := cache.NewAutoRefreshCache(name, q.SyncResource,
workqueue.DefaultControllerRateLimiter(), cfg.ResyncInterval.Duration, cfg.Workers, cfg.Size,
scope.NewSubScope("cache"))
• ResyncInterval
, Workers
and Size
configs are respected (it's working in our setup also), but I can't see any utilization of the webapi ratelimiter
configs, workqueue.DefaultControllerRateLimiter() uses hard coded values (qps: 10, burst: 100
)
Can you please clarify that my understanding is correct and ratelimiter
configs should be applied here?Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
resource quota
working?anantharaman janakiraman
04/05/2024, 4:38 PManantharaman janakiraman
04/05/2024, 4:38 PMKetan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
anantharaman janakiraman
04/05/2024, 5:06 PMHaytham Abuelfutuh
Haytham Abuelfutuh
Haytham Abuelfutuh
Attempting to finalize resource
There is also a metric .resource_release_failed
that tracks failures to release resources. Can you check for that too?anantharaman janakiraman
04/05/2024, 6:13 PMKetan (kumare3)
GF
04/08/2024, 4:45 PMKetan (kumare3)
Kevin Su
04/08/2024, 5:18 PMRobert Ambrus
04/09/2024, 11:46 AM