Hey all So I have been kicking the tires a bit with Flyte wi Flyte #flyte-support

Hey all! So I have been kicking the tires a bit wi...

clean-agent-63333

11/24/2023, 6:04 PM

Hey all! So I have been kicking the tires a bit with Flyte with a local instance (deployed to k3s on MacOS) and used this example to try things out. When putting together my

diabetes.py

file I was finally able to get it working once I raised the

mem

request for each task. The example has a

200MiB

setting for each task and I raised it to

1G

and got everything to work. The big issue was I didn't know the pods were dying due to an

OOM

issue with the resources until looking at the

rancher dashboard

(shown below). The errors in the

Flyte

console (also shown below) were not helpful. Is there an issue with my setup or does

Flyte

not surface

OOM

errors up to the console? Thanks in advance!

freezing-airport-6809

11/24/2023, 9:09 PM

It should surface the errors, what error did you see?

freezing-airport-6809

11/24/2023, 9:10 PM

Cc @high-park-82026 / @flat-area-42876 I have seen this a couple times recently - what happened

clean-agent-63333

11/24/2023, 9:27 PM

Hey @freezing-airport-6809! The error is very vague (just what's in the screenshots)

[1/1] currentAttempt done. Last Error: USER::

clean-agent-63333

11/24/2023, 9:28 PM

one run got further and had more of an error, but the ultimate reason was still

OOMkilled

flat-area-42876

11/28/2023, 12:39 AM

Hi @clean-agent-63333 thank you for pointing this out. I will look into this later this evening/early tomorrow

high-park-82026

11/28/2023, 1:18 AM

Hey @clean-agent-63333 where do you see the

OOMKilled

? One thing I would ask you double check of: https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html set

inject-finalizer: true

This forces k8s to not delete the pod until flyte had a chance to review its termination cause. Depending on your k8s config, it might attempt to do that (e.g. if it tries to scale down as soon as the pod is done, maybe if you are running karpenter or an aggressive autoscaler)

👍 1

clean-agent-63333

11/28/2023, 1:52 AM

The OOM is from the pod events (first screenshot). I will give it a try with the finalizer injection.

5 Views

Open in Slack

Previous Next