Hi since updating the flyte binary chart from 1 6 2 to 1 8 1 Flyte #flyte-support

Hi, since updating the flyte-binary chart from 1.6...

white-teacher-47376

08/07/2023, 8:21 AM

Hi, since updating the flyte-binary chart from 1.6.2 to 1.8.1, a lot of my tasks are killed with the following message:

code:"Interrupted" message:"The node was low on resource: ephemeral-storage. Container alxkcdhndgnqd6kzbq95-n0-0-n23-0-40 was using 20205640Ki, which exceeds its request of 0

With the previous version, the pods could use all the storage available on the node. Did allocation of ephemeral storage change between these versions? Thanks for help

freezing-airport-6809

08/07/2023, 2:31 PM

This has to do with k8s version as well. Can you tell us did you change k8s version? If not it seems ephemeral storage is being set to 0 instead of ignoring. Cc @freezing-boots-56761 / @hallowed-mouse-14616

freezing-boots-56761

08/07/2023, 2:34 PM

afaik, this only happens when the node is under disk pressure. looks like the task is using ~20Gi. how much storage does the node have, and are there other workloads running on the same node?

white-teacher-47376

08/08/2023, 8:27 AM

Thanks for the replies. It is possible that these errors are related to disk pressure, I was confused, since in the previous version the error message in that case was "DiskPressure". Also it seems like the old DiskPressure error also still appears. Is it possible, that the error message is different for a map_task and a python task, but the origin is the same for both - killed by kubernetes due to disk pressure?

freezing-boots-56761

08/08/2023, 11:08 AM

@white-teacher-47376: in the previous version, the disk pressure doesn’t cause eviction?

white-teacher-47376

08/08/2023, 1:25 PM

It did, only the error message was different

freezing-boots-56761

08/08/2023, 1:59 PM

What was the error?

freezing-boots-56761

08/08/2023, 1:59 PM

Ah “DiskPressure” vs “Interrupted”?

freezing-boots-56761

08/08/2023, 2:01 PM

I’m not sure if Flyte wraps the underlying k8s error, but it def looks to me like they’re both caused by eviction due to disk pressure.

👍 1

21 Views

Open in Slack

Previous Next