Using flyte with GCS It stores a lot of my outputs between s Flyte #flyte-support

Using flyte with GCS. It stores a lot of my output...

kind-kite-58745

06/14/2023, 6:35 PM

Using flyte with GCS. It stores a lot of my outputs between steps that I no longer need when the run is over, is there a way to configure retention for outputs?

rich-garden-69988

06/14/2023, 9:06 PM

On GCP I have it write to a bucket that is life cycled - would that be possible with your setup?

kind-kite-58745

06/15/2023, 8:09 AM

I considered this option, but that would tie the lifecycle of the registered tasks and workflows to that of the outputs. From what I checked, GCS lets you create lifecycle rules that match a suffix or a prefix. Flyte uploads into the bucket like this:

Copy code

0a/
22/
2i/
3h/
flytesnacks/ (project name)
kq/
kt/
metadata/
...

GCS doesn’t seem to support “match everything but this prefix”, so I’d have to give it all those 0a, 22, 2i folder names if I don’t want it to apply the lifecycle to the registered workflows and tasks • That’s why I was hoping there was a way to clean up from Flyte’s side. The GCP installation guide says to give the flyte server

storage.objects.delete

permissions on GCP, so it should be able to delete objects

kind-kite-58745

06/15/2023, 8:40 AM

Just found that I can configure where the outputs are stored using

configmap.core.propeller.rawoutput-prefix

in flyte-core’s helm values.yaml. Now that it’s set to

"gs://{{ .Values.userSettings.bucketName }}/outputs/"

, I ran some workflows, and all these output folders are created under

outputs/

so I can apply separate lifecycle rules for them now based on prefix. Thanks!

🙌 2

rich-garden-69988

06/15/2023, 2:51 PM

Awesome! How I usually do this is I have all FlyteFiles write to a lifecycled bucket by default. If I want to keep artifacts, I explicitly pass in a remote path (a non-lifecycled bucket) when returning FlyteFile!

👍 1

kind-kite-58745

06/18/2023, 12:10 PM

Three days have passed, and the lifecycle rules on the bucket deleted my outputs as expected. However the task was still skipped because flyte expects it to be cached, and the task that receives the inputs failed due to 401 error from GCS. I bumped the cache version and it works again now. I’m curious to hear how do you deal with cached outputs being lifecycled in your case? Or is this not an issue with your specific use case? Thanks

152 Views

Open in Slack

Previous Next