Hi I m evaluating Flyte for our workflow orchestration needs Flyte #flyte-support

Hi! I'm evaluating Flyte for our workflow orchestr...

quick-scooter-86395

03/27/2023, 4:16 PM

Hi! I'm evaluating Flyte for our workflow orchestration needs. I'm curious what will happen when WorkflowExecutor tries to update the task status to FlyteAdmin but FlyteAdmin is unavailable for a prolonged duration (say 1 hour) for some reason. Will the WorkflowExecutor eventually update the correct task status to the FlyteAdmin when it comes back up? Or will the FlyteAdmin have an inconsistent state (say Running when the task has actually failed/succeeded when the FlyteAdmin was down)

freezing-airport-6809

03/27/2023, 4:19 PM

@quick-scooter-86395 great question. So this is configurable. Ideally we expect that FlyteAdmin is highly available. But in the case that FlyteAdmin is down - it is tolerable for "x" minutes, which is configurable. FlytePropeller has local state in etcD per K8s cluster. as long as the cluster is up that state exists. But, after retrying until "x", FlyteAdmin will be incosistent

quick-scooter-86395

03/27/2023, 4:29 PM

Thanks for the quick response!! What is the upper bound for

? Could you point me to the code where this is performed? I'm still learning my way around

go

and I was able to find the logic (I think) to handle reporting failures but I am still unable to find where retries are performed

freezing-airport-6809

03/27/2023, 6:46 PM

@quick-scooter-86395 you can simply set this value to be very high - https://docs.flyte.org/en/latest/deployment/configuration/generated/scheduler_config.html#max-workflow-retries-int

👍 1

freezing-airport-6809

03/27/2023, 6:46 PM

and you can use queues - https://docs.flyte.org/en/latest/deployment/configuration/generated/scheduler_config.html#queue-config-compositequeueconfig

👍 1

150 Views

Open in Slack

Previous Next