I’ve encountered an issue in the Flyte UI when run...
# flyte-support
b
I’ve encountered an issue in the Flyte UI when running workflows with large numbers of
map_task
members. In a recent test with
map_task
with 1,000 executions I see errors loading the workflow details with the Flyte UI. The workflow itself succeeded, with 100% success rate across the
map_task
members, but I am unable to load the workflow UI. When navigating the the workflow details I see a react error message after a few seconds of loading: > RangeError: Array buffer allocation failed > (screenshot #1) And an error in the console (screenshot #2) I see no network request failures, and no errors logged in
flyteadmin
,
flytepropeller
, or
flyteconsole
- and believe this to be a front-end issue. My team would eventually like to scale up our workload one or two orders of magnitude higher (10K-100K tasks), and are concerned that this tool (at least the UI) won’t be able to handle that scale. Are there guidelines about maximum sizes for
map_task
executions? Or are there workarounds I should investigate to scale further (ex separate executions into smaller batches in subworkflows?)
Also to note, this flyte installation uses
flyte-core
helm charts with version
1.15.3
f
Flyte v1 map tasks cannot scale to 10k+ numbers. We recommend <5k
this is restricted because of the size of etcd
Union engine for flyte v2 can scale to around 30k today and goal is 100k soon
b
Do you have any guidance how how to navigate these issues in Flyte v1? I’ve tried batching
map_tasks
into sub workflows, but even though the tasks themselves succeed the UI errors with these Array buffer allocation failures
f
That’s odd
Hi should work
Is it the Grpc limit? Or a network issue
Are you using single binary
b
I'm using flyte-core. I think it's just a frontend issue - I didn't see any error logs when looking at the k8s services or any network request errors when looking at the chrome dev console
f
We have not seen these
And there are folks running way larger here
b
We ended up figuring out what was wrong, and were misusing a dynamic workflow. What had happened was a list of config objects was being prepared within the dynamic itself (instead of within a task). When passing this data into the task odd behavior occurred since the data was not written out to remote storage, as it would if being returned from the task. We have not investigated the problem further, but are not experiencing it anymore after moving the config construction into a task
👍 1