helpful-church-28990
04/22/2024, 8:13 PMhelpful-church-28990
04/22/2024, 8:20 PMgentle-tomato-480
04/22/2024, 9:24 PMdata
and metadata/propeller
and metadata/{project}/{domain}
older than e.g. 30 days (or whatever your policy should be) can be deleted as these are the inputs/outputs generated during executions. (and these grow the most)
The rest I wouldn't touch and instead manage via flyte (as the rest of the objects in a bucket is about workflow/task definitions)gentle-tomato-480
04/22/2024, 9:31 PMgsutil du
(I'm on GCP). Of the 26.7GBs in my bucket, almost all of it (99.9%) is in data
. So just setting a policy on that folder in the bucket should be enough to keep costs (and I imagine it wouldn't break other things unless you happen to run your workflows from 30+ day old cached results)helpful-church-28990
04/23/2024, 5:44 AMhigh-park-82026
pyflyte run --remote
) these are tar files that have your code as you iterate through it using that command. If you are always building new docker images with your code changes, you won't see these tar files. --> /metadata
2. Intermediate inputs/outputs: primitive data produced by tasks/workflows as well as flyte decks, errors.. etc. --> /metadata
3. Raw data: offloaded data types (e.g. FlyteFile) --> /<two letters>/...
If you delete 2 and/or 3, old execution pages won't render properly (you maybe ok with that) and cache lookups of old executions will result in dangling pointers (a pointer that points no where). There is a setting on Propeller for MaxCacheAge (or something along those lines) to prevent such cases... you should generally set that to be half the lifecycle policy...
If you delete 1, you will fail to rerun old versions of the workflows that used fast register to run...
The problem is you can't easily distinguish between 1 and 2 (they both get written to /metadata)... but maybe you can set the policy by file extension?
I think a combination of that + the maxCacheAge should be sufficient to keep your data usage in check.helpful-church-28990
04/25/2024, 2:15 PM