Hi all, I'm using Flyte in GCP with GCSFuse CSI dr...
# flyte-support
n
Hi all, I'm using Flyte in GCP with GCSFuse CSI driver. The driver mounts a sidecar container, which waits for all other containers to terminate before terminating itself. The problem is that Flyte Copilot also does the same - waits for all other containers to terminate, which causes a deadlock between the two where they wait on each other to terminate. I had a workaround for that where I explicitly sent a SIGTERM to the GCSFuse sidecar, which terminated it. (For some reason it worked only 90% of the time) But after upgrading my GKE cluster to 1.29, the GCSFuse sidecar is injected as an init-container with
restartPolicy: Always
, which means my workaround doesn't work anymore. Does anyone know how can I fix it easily? Our pipelines are now stuck because of this, as it is impossible to downgrade the GKE cluster Thanks!
f
Do you need to use copilot?
I mean raw containers?
Can you use flytekit
n
yes, we use raw containers, flytekit isn't good enough
f
Can you explain more please
n
We use many different docker images that we had before moving to Flyte. I had to change them to work with Flyte, but basically our Flyte workflow is built mostly of
ContainerTask
f
Ohh so flytekit is not the problem raw containers tasks help in migration
The thing is, we implemented raw container tasks prior to kubernetes having the capability of running sidecar containers, and so Flyte propeller actually makes the decision of when to kill a pod and it waits for all other containers to exit. We could change that, but that is a back end change.
I do believe that supporting all types of sidecar containers, and yet exiting gracefully should be part of Flyte. let’s file a issue but I do not think we can work on this ASAP
You can contribute, but we would need a written proposal of how you will identify and ensure correctness
n
You're saying it's not related to copilot? Right now I tried creating my own copilot image, specifically ignoring the GCSFuse sidecar, which seems to work and everything terminates (although now I get an error regarding the container task outputs, but might be unrelated)
f
This is done by the engine
Copilot cannot exit right
n
I did end up fixing it by making a new copilot image ignoring the GCSFuse sidecar in its watcher although of course this isn't maintainable and I'd rather have a fix coming from you guys