Hi on my GCP Flyte deployment Flyte deletes dynami...
# announcements
Hi on my GCP Flyte deployment Flyte deletes dynamic tasks from k8s as soon as they complete whether they pass or fail. Regular python tasks are deleted after the GC Interval which is 30h on our cluster. This makes it very hard to debug dynamic tasks if something goes wrong because the pod is no longer in k8s. Is this something that I can stop Flyte from doing via a config option?
On my cluster the following script spawns 10
tasks as part of a dynamic workflow and all 10 are deleted from k8s as soon as they complete
@Dan Rammer (hamersaw) any ideas?
👀 1
Hi @Nicholas LoFaso, sorry for the late response. When you refer to regular python tasks do you mean executing a single task? or executing a a non-dynamic workflow containing only python tasks?
Hi @Dan Rammer (hamersaw) no worries. I’m referring to the latter case. Running a @workflow that contains python tasks. Those are not cleaned up immediately (which is my expectation)
We’ve stopped using
, but I noticed the same behavior with
as the
tasks as well. If that helps troubleshooting.
I can also send my helm values file to show you our setup
@Nicholas LoFaso thanks for you patience on this. It looks like there is a
that is set on the plugins configmap for flytepropeller. It controls, as you may guess, whether or not k8s resources are deleted when flyte attempts to finalize the task. In your above configuration this may look like:
Copy code
          - FLYTE_STATSD_HOST: "flytestatsd.datadog.svc.cluster.local"
        delete-resource-on-finalize: false
        create-container-error-grace-period: 8m0s
In my testing I was able to not delete dynamic task pods on completion / failure. Let me know if this works for you and / or if you run into any more issues.
I should note that this will not work in map tasks as they do not flow through the propeller plugin manager. We are actively working on a number of map task improvements like individual subtask retries, better flyteconsole support (viewing individual subtask phases / retries), improved parallelism, etc that should be available in the next release.
Hi @Dan Rammer (hamersaw) thanks for getting back to me. I will try running with
delete-resource-on-finalize: false
and hopefully that will resolve the issue
🙏 1