Hey all! So I have been kicking the tires a bit wi...
# ask-the-community
r
Hey all! So I have been kicking the tires a bit with Flyte with a local instance (deployed to k3s on MacOS) and used this example to try things out. When putting together my
diabetes.py
file I was finally able to get it working once I raised the
mem
request for each task. The example has a
200MiB
setting for each task and I raised it to
1G
and got everything to work. The big issue was I didn't know the pods were dying due to an
OOM
issue with the resources until looking at the
rancher dashboard
(shown below). The errors in the
Flyte
console (also shown below) were not helpful. Is there an issue with my setup or does
Flyte
not surface
OOM
errors up to the console? Thanks in advance!
k
It should surface the errors, what error did you see?
Cc @Haytham Abuelfutuh / @Paul Dittamo I have seen this a couple times recently - what happened
r
Hey @Ketan (kumare3)! The error is very vague (just what's in the screenshots)
[1/1] currentAttempt done. Last Error: USER::
one run got further and had more of an error, but the ultimate reason was still
OOMkilled
p
Hi @Ryan Russon thank you for pointing this out. I will look into this later this evening/early tomorrow
h
Hey @Ryan Russon where do you see the
OOMKilled
? One thing I would ask you double check of: https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html set
inject-finalizer: true
This forces k8s to not delete the pod until flyte had a chance to review its termination cause. Depending on your k8s config, it might attempt to do that (e.g. if it tries to scale down as soon as the pod is done, maybe if you are running karpenter or an aggressive autoscaler)
r
The OOM is from the pod events (first screenshot). I will give it a try with the finalizer injection.