Ryan Russon
11/24/2023, 6:04 PMdiabetes.py
file I was finally able to get it working once I raised the mem
request for each task. The example has a 200MiB
setting for each task and I raised it to 1G
and got everything to work. The big issue was I didn't know the pods were dying due to an OOM
issue with the resources until looking at the rancher dashboard
(shown below). The errors in the Flyte
console (also shown below) were not helpful. Is there an issue with my setup or does Flyte
not surface OOM
errors up to the console? Thanks in advance!Ketan (kumare3)
Ketan (kumare3)
Ryan Russon
11/24/2023, 9:27 PM[1/1] currentAttempt done. Last Error: USER::
Ryan Russon
11/24/2023, 9:28 PMOOMkilled
Paul Dittamo
11/28/2023, 12:39 AMHaytham Abuelfutuh
OOMKilled
? One thing I would ask you double check of:
https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html
set inject-finalizer: true
This forces k8s to not delete the pod until flyte had a chance to review its termination cause. Depending on your k8s config, it might attempt to do that (e.g. if it tries to scale down as soon as the pod is done, maybe if you are running karpenter or an aggressive autoscaler)Ryan Russon
11/28/2023, 1:52 AM