Olivier Sevin
04/12/2024, 8:03 PMGlime
04/12/2024, 8:03 PMOlivier Sevin
04/12/2024, 8:05 PMGlime
04/12/2024, 8:05 PMbackoffLimit
and restartPolicy
in the job configuration. However, specific configurations related to backoff mechanisms for Flyte tasks are not directly mentioned in the provided context.
If you are experiencing high rates of pod failure and retries, you might want to look into:
1. Adjusting the backoffLimit
and restartPolicy
settings in your Kubernetes job configurations.
2. Ensuring that your resource quotas and limits are appropriately configured to handle the load.
3. Investigating any underlying issues that might be causing the high failure rate of pods, such as network instability, insufficient resources, or configuration errors.
For more detailed guidance and to implement a solution tailored to Flyte, I recommend checking the Flyte documentation or reaching out on the Flyte community channels for specific advice related to backoff configurations in Flyte tasks.
Sources
https://discuss.flyte.org/t/12043119/we-are-regularly-hitting-the-issue-mentioned-in-this-resolve/
https://discuss.flyte.org/t/10974209/hey-all-i-m-back-with-a-quick-question-i-have-a-situation-wh/ Answered By Glime 👍 button 👎 buttonDavid Espejo (he/him)
04/12/2024, 9:11 PMDavid Espejo (he/him)
04/12/2024, 9:26 PMtasks
section in your Helm values file.
What I don't see is a way to enable/disable it. Seems like the backoff handler is enabled by default? cc @Dan Rammer (hamersaw)Olivier Sevin
04/12/2024, 9:38 PMOlivier Sevin
04/15/2024, 11:15 PMMatthew Corley
04/23/2024, 12:08 AMDavid Espejo (he/him)
04/23/2024, 5:42 PM