Hey everyone, looking into making my workflows mor...
# flyte-support
g
Hey everyone, looking into making my workflows more reliable by limiting how many executions can run concurrently. I am looking in the docs but haven't found a configuration that manages this. I'm ok with any parallelism within workflow executions My main goal is to handle peaks of incoming data (that trigger new workflow executions) better and prevent rising costs and jobs from failing due to timeouts and limited resources/failing to scale the cluster. Looking at https://www.union.ai/docs/v1/flyte/deployment/flyte-configuration/performance/#1-workers-the-workqueue-and-the-evaluation-loop and https://www.union.ai/docs/v1/flyte/deployment/configuration-reference/scheduler-config/#queue-configcompositequeueconfig
https://github.com/flyteorg/flyte/pull/5659/files Seems like there was a RFC for this, with designs etc, but has it been implemented?
Ah. I see that if currently the default and only behaviour in case the limit is hit, is that that the workflowexecution creation fails So I would need to have retries on whatever triggers/starts workflows to make sure they all get scheduled
w
We setup Kueue to work with Flyte so that each team's gets their own ClusterQueue. When the ClusterQueue gets busy, Kueue acts as queue for the incoming workloads. https://kueue.sigs.k8s.io/docs/
g
@worried-airplane-87065 Awesome. I was also looking into that earlier, do you queue per task type? Per workflow execution? Does flyte set the status to running, if so how do you handle timeouts?
w
We have a queue per team with their own limits. We just inject "kueue.x-k8s.io/queue-name: MY_TEAM-local-queue" for workflow invocations. We see "Queued" when the workload is pending and then "Running" once it's running. I haven't looked into timeout behaviors yet.
👍 2
e
The concurrency is added recently and I think flyte only support SKIP option for now. Now quite sure about the behavior, whether the task will be ignored or stayed in the Not Started state and can be restart
g
It does indeed only support Skip. I'll test it out and have a look if it fits my needs
👍 1