Using flyte with GCS. It stores a lot of my output...
# ask-the-community
Using flyte with GCS. It stores a lot of my outputs between steps that I no longer need when the run is over, is there a way to configure retention for outputs?
On GCP I have it write to a bucket that is life cycled - would that be possible with your setup?
I considered this option, but that would tie the lifecycle of the registered tasks and workflows to that of the outputs. From what I checked, GCS lets you create lifecycle rules that match a suffix or a prefix. Flyte uploads into the bucket like this:
Copy code
flytesnacks/ (project name)
GCS doesn’t seem to support “match everything but this prefix”, so I’d have to give it all those 0a, 22, 2i folder names if I don’t want it to apply the lifecycle to the registered workflows and tasks • That’s why I was hoping there was a way to clean up from Flyte’s side. The GCP installation guide says to give the flyte server
permissions on GCP, so it should be able to delete objects
Just found that I can configure where the outputs are stored using
in flyte-core’s helm values.yaml. Now that it’s set to
"gs://{{ .Values.userSettings.bucketName }}/outputs/"
, I ran some workflows, and all these output folders are created under
so I can apply separate lifecycle rules for them now based on prefix. Thanks!
Awesome! How I usually do this is I have all FlyteFiles write to a lifecycled bucket by default. If I want to keep artifacts, I explicitly pass in a remote path (a non-lifecycled bucket) when returning FlyteFile!
Three days have passed, and the lifecycle rules on the bucket deleted my outputs as expected. However the task was still skipped because flyte expects it to be cached, and the task that receives the inputs failed due to 401 error from GCS. I bumped the cache version and it works again now. I’m curious to hear how do you deal with cached outputs being lifecycled in your case? Or is this not an issue with your specific use case? Thanks