Hi All I m trying to run a large workflow on our GKE cluster Flyte #announcements

Hi All, I’m trying to run a large workflow on our ...

rough-rose-81585

04/01/2022, 3:10 PM

Hi All, I’m trying to run a large workflow on our GKE cluster with many thousands of parallel tasks. Is there a recommended Flyte config to make sure Flyte is maximizing our compute resources? For example helm chart settings that might allow more tasks to be scheduled in parallel More information about my current situation in thread

rough-rose-81585

04/01/2022, 3:13 PM

Currently our GCP account has the ability to support about 2500-3000 tasks in parallel based on the CPU/MEM requirements of these Flyte tasks Initially the workflow started and FlytePropeller scheduled and ran about 500 tasks in parallel. I figured it would ramp up over time as new nodes were provisioned However after about 30 minutes the concurrent pods dropped to around 150 tasks and it’s been that way ever since (over 60 minutes) We have about 40 nodes running currently and they are only using about 25% of CPU/Mem, so not sure why FlytePropeller isn’t scheduling more tasks I also checked the CPU/Mem quota in the namespace and it’s well under the limit as well

rough-rose-81585

04/01/2022, 3:13 PM

cc: @gifted-raincoat-59712

freezing-airport-6809

04/01/2022, 3:20 PM

Cc @hallowed-mouse-14616 / @high-park-82026

freezing-airport-6809

04/01/2022, 3:20 PM

Having a diagram of the fan out will be interesting to see

rough-rose-81585

04/01/2022, 3:24 PM

How can I share that with you?

rough-rose-81585

04/01/2022, 3:25 PM

the Flyte Console graph for the workflow is not very interesting

freezing-airport-6809

04/01/2022, 3:25 PM

Is it like just a fan out of 2500/5000?

freezing-airport-6809

04/01/2022, 3:25 PM

If so why not use map task?

rough-rose-81585

04/01/2022, 3:26 PM

it’s 100 input files each which generates 1280 tasks so 128,000 total tasks

freezing-airport-6809

04/01/2022, 3:26 PM

Ohh, in a fan out on one graph?

freezing-airport-6809

04/01/2022, 3:27 PM

I think there are a few things over here- max parallelism, number of workers, k8s config etc

rough-rose-81585

04/01/2022, 3:27 PM

it’s in one workflow execution currently so i assume one graph?

rough-rose-81585

04/01/2022, 3:28 PM

I can launch as one workflow execution per file if that would help, but I’d like to understand the config options available to our deployment like you said max parallelism, etc.

rough-rose-81585

04/01/2022, 3:32 PM

Currently we only have 1 FlytePropeller pod that has 1CPU and 2GB mem it doesn’t seem to be maxing out on either. Propeller also has 50 workers configured

freezing-airport-6809

04/01/2022, 3:49 PM

Hey Nicolas do you have a couple minutes to get on Avicii

freezing-airport-6809

04/01/2022, 3:49 PM

A VC

freezing-airport-6809

04/01/2022, 3:50 PM

One draft running 120 8000 tasks I don’t think it is supported

rough-rose-81585

04/01/2022, 4:20 PM

Hi Ketan yes I can meet up and happy to reconfigure to stay within limits it’s a very parallel, and I’m probably not setting everything up optimially

freezing-airport-6809

04/01/2022, 4:21 PM

ok let me dm you. ideally you should just use launch-plans

👍 1

rough-rose-81585

04/01/2022, 8:39 PM

@freezing-airport-6809 @hallowed-mouse-14616 splitting to a separate launchplan per input file worked. All of our available 3000 vCPUs are churning! Thanks for your help!

freezing-airport-6809

04/01/2022, 10:02 PM

😄

freezing-airport-6809

04/01/2022, 10:02 PM

❤️

165 Views

Open in Slack

Previous Next