We're attempting to debug a Flyte workflow on EKS ...
# flyte-support
c
We're attempting to debug a Flyte workflow on EKS where our pods complain there isn't enough ephemeral storage. Is it possible to get the pod spec that is issued to Kubernetes? The details on the "Task Details" link look close to what the pod would issue but I wondered if that's everything
f
You can add more ephemeral storage
In resources add it
c
We've done that now, and that's gotten us passed one hurdle. I suppose the basic iteration loop if is something fails on Kubernetes knowing the pod spec means I can reproduce the pod invocation manually which is helpful for debugging. Re-running the workflow also works, but having a pod spec makes it easier to hand off to one of my devops engineers.
f
Yes this is definitely something that’s coming
In Flyte v1 sadly the storage is etcd which is limited to 1.5mb per object so storing things is hard
c
Fair enough, ultimately we've now got more familiar with intercepting the pod scheduling so we've been able to determine roughly how it works, but part of the game is catching it before the resource gets deleted đŸ™‚ Basic issue we were having is trying to launch a pytorch with cuda image that weighs about 8Gb on a ephemeral storage that was too small (~20Gb).