Hi, since updating the flyte-binary chart from 1.6.2 to 1.8.1, a lot of my tasks are killed with the following message:
code:"Interrupted" message:"The node was low on resource: ephemeral-storage. Container alxkcdhndgnqd6kzbq95-n0-0-n23-0-40 was using 20205640Ki, which exceeds its request of 0
With the previous version, the pods could use all the storage available on the node. Did allocation of ephemeral storage change between these versions?
Thanks for help
f
freezing-airport-6809
08/07/2023, 2:31 PM
This has to do with k8s version as well. Can you tell us did you change k8s version? If not it seems ephemeral storage is being set to 0 instead of ignoring. Cc @freezing-boots-56761 / @hallowed-mouse-14616
f
freezing-boots-56761
08/07/2023, 2:34 PM
afaik, this only happens when the node is under disk pressure. looks like the task is using ~20Gi. how much storage does the node have, and are there other workloads running on the same node?
w
white-teacher-47376
08/08/2023, 8:27 AM
Thanks for the replies. It is possible that these errors are related to disk pressure, I was confused, since in the previous version the error message in that case was "DiskPressure". Also it seems like the old DiskPressure error also still appears. Is it possible, that the error message is different for a map_task and a python task, but the origin is the same for both - killed by kubernetes due to disk pressure?
f
freezing-boots-56761
08/08/2023, 11:08 AM
@white-teacher-47376: in the previous version, the disk pressure doesn’t cause eviction?
w
white-teacher-47376
08/08/2023, 1:25 PM
It did, only the error message was different
f
freezing-boots-56761
08/08/2023, 1:59 PM
What was the error?
freezing-boots-56761
08/08/2023, 1:59 PM
Ah “DiskPressure” vs “Interrupted”?
freezing-boots-56761
08/08/2023, 2:01 PM
I’m not sure if Flyte wraps the underlying k8s error, but it def looks to me like they’re both caused by eviction due to disk pressure.