Hi! I'm evaluating Flyte for our workflow orchestr...
# ask-the-community
Hi! I'm evaluating Flyte for our workflow orchestration needs. I'm curious what will happen when WorkflowExecutor tries to update the task status to FlyteAdmin but FlyteAdmin is unavailable for a prolonged duration (say 1 hour) for some reason. Will the WorkflowExecutor eventually update the correct task status to the FlyteAdmin when it comes back up? Or will the FlyteAdmin have an inconsistent state (say Running when the task has actually failed/succeeded when the FlyteAdmin was down)
@Srinivas Venkattaramanujam great question. So this is configurable. Ideally we expect that FlyteAdmin is highly available. But in the case that FlyteAdmin is down - it is tolerable for "x" minutes, which is configurable. FlytePropeller has local state in etcD per K8s cluster. as long as the cluster is up that state exists. But, after retrying until "x", FlyteAdmin will be incosistent
Thanks for the quick response!! What is the upper bound for
? Could you point me to the code where this is performed? I'm still learning my way around
and I was able to find the logic (I think) to handle reporting failures but I am still unable to find where retries are performed
@Srinivas Venkattaramanujam you can simply set this value to be very high - https://docs.flyte.org/en/latest/deployment/configuration/generated/scheduler_config.html#max-workflow-retries-int