New RunLLM response generated >>> Question Hello I am invest Flyte #runllm-flyte-trial

New RunLLM response generated: >>>*# Question* He...

jolly-florist-66728

11/20/2024, 7:37 PM

New RunLLM response generated: >>>*# Question* Hello! I am investigating some odd behavior in Flyte propeller that seems to happen at high scale and I'm looking for clarification about what is going on. It seems that when we have a large number of concurrent tasks and the propeller queue fills up it gets into a weird state where it doesn't seem to process anything. I have attached some screenshots below. As pressure on propeller increases you can see the free worker count decrease until it hits 0. At around 23:00 the free workers jump back up near the maximum, the queue depth apparently hits 0, but workflow acceptance begins climbing steadily and at this point nothing is processing. # Answer It sounds like you are experiencing a resource saturation issue with FlytePropeller. When the free worker count hits zero, it indicates that all available workers are occupied, which can lead to increased latency and processing delays. This is often due to the number of concurrent workflows exceeding the capacity of the configured workers. According to the Flyte documentation, increasing the number of workers can help alleviate this issue. The number of workers should be balanced with the available CPU cores to avoid excessive context switching. Additionally, monitoring the

round_latency

metric can provide insights into the performance bottlenecks. If the queue depth is zero but workflows are not processing, it might indicate a problem with the workflow evaluation loop or a potential bug in the system. Would you like more detailed steps on how to address this issue or further troubleshooting steps? Sources: - Flyte Documentation on Performance Slack Link https://flyte.slack.com/archives/CP2HDHKE1/p1732131434.815739 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next