clean-agent-63333
11/24/2023, 6:04 PMdiabetes.py
file I was finally able to get it working once I raised the mem
request for each task. The example has a 200MiB
setting for each task and I raised it to 1G
and got everything to work. The big issue was I didn't know the pods were dying due to an OOM
issue with the resources until looking at the rancher dashboard
(shown below). The errors in the Flyte
console (also shown below) were not helpful. Is there an issue with my setup or does Flyte
not surface OOM
errors up to the console? Thanks in advance!freezing-airport-6809
freezing-airport-6809
clean-agent-63333
11/24/2023, 9:27 PM[1/1] currentAttempt done. Last Error: USER::
clean-agent-63333
11/24/2023, 9:28 PMOOMkilled
flat-area-42876
11/28/2023, 12:39 AMhigh-park-82026
OOMKilled
? One thing I would ask you double check of:
https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html
set inject-finalizer: true
This forces k8s to not delete the pod until flyte had a chance to review its termination cause. Depending on your k8s config, it might attempt to do that (e.g. if it tries to scale down as soon as the pod is done, maybe if you are running karpenter or an aggressive autoscaler)clean-agent-63333
11/28/2023, 1:52 AM