Maybe I think inside the wrong box. Would the "fly...
# ask-the-community
e
Maybe I think inside the wrong box. Would the "flyte-way" be this: • new config -> new "experiment" • keep previous experiments • ability to delete any experiment (ability to roll-back in time) ?
s
You cannot delete an execution / experiment (if experiment ~ execution). You can only archive it.
e
oh, really - like there is no way? wouldn't storage grow at a really fast pace then? Given that I'm correct about these assumptions about what is an experiment: • dataset (e.g. images, annotations, ...) ◦ included here because the dataset might change between experiments ◦ or is it possible to store/cache dataset diffs in some way? (i.e. only the changes) • network code • snapshots
s
Yeah, it might. In that case, you'd have to delete the data manually. @Dan Rammer (hamersaw), is there a better way to handle storage in case executions and their related data needs to be deleted at some point in time?
or is it possible to store/cache dataset diffs in some way? (i.e. only the changes)
You can cache task outputs by setting
cache
to
True
in
@task
. Is your dataset a data structure of some kind? Or is it a URL?
d
@ewam so Flyte is very opinionated about reproducability and data lineages which is why deleting executions / workflows / projects / etc is not a part of the API. And storage is relatively cheap, so it is typically not an issue. However, users with large datasets can combat this by using combinations of caching and passing data using file references rather that full-resolution data (ex. dataframes, etc). Then flyte just stores a link to the data. I know there are a number of users that have external services which periodically delete "old" data (ex. s3 files). In this scenario, Flyte still maintains all of the metadata around those executions (ex. data references, node / task durations, etc) but there will be failures when attempting to retrieve specific task input / output values since those have been deleted.
152 Views