Hello curious if anyone else has had a need for a FlyteFile Flyte #feature-discussions

Hello, curious if anyone else has had a need for a...

thankful-dress-89577

08/30/2022, 8:24 PM

Hello, curious if anyone else has had a need for a FlyteFile that has an associated lifetime - whereby it would be automatically cleaned up after a user-specified timeframe. This could be along the lines of compliance with data retention policies.

freezing-airport-6809

08/30/2022, 8:27 PM

@thankful-dress-89577 can you not do this directly on the target S3 bucket?

thankful-dress-89577

08/30/2022, 8:34 PM

Good point Ketan, you can definitely use s3 bucket lifecycle policies, but it is fairly coarse. Tag filtering could be used to be a bit more granular, up to a point (there would be a limit on the number of lifecycle policies a bucket can have). So I think that would cover many cases, yes. Still, files managed by Flyte (ie. FlyteFile) would need to be created with specific tags that match those policies - not sure if that is controllable and to what degree. Maybe I want some files to expire (retention policy) and others I don’t mind keeping around longer So, I was imagining more fine grained control where per-file expiry timestamps could be specified, but I think you could manage just with some tagging convention paired with the bucket lifecycle policy.

freezing-airport-6809

08/30/2022, 9:08 PM

would love to hear a proposal

freezing-airport-6809

08/30/2022, 9:08 PM

feel free to do it and we can share it in one of the community meetings

thankful-dress-89577

08/30/2022, 10:40 PM

Sure, for a proposal do you mean a document, github issue, sketch of the programmatic api? Also, it occurred to me, even if we can solve for s3 case via bucket policies, supporting other cloud vendors makes it more complicated if that is the implementation. It feels like it might be a better fit for something flyte keeps track of more directly.

freezing-airport-6809

08/31/2022, 4:06 AM

hmm

freezing-airport-6809

08/31/2022, 4:06 AM

ya an issue or there is an RFC process

freezing-airport-6809

08/31/2022, 4:06 AM

check the flyte repo

thankful-dress-89577

08/31/2022, 1:52 PM

I’ve put together a description as an issue here: https://github.com/flyteorg/flyte/issues/2832

thankful-dress-89577

08/31/2022, 4:29 PM

I wonder if one way to represent this might be an

Expireable[T]

annotation. So it could be

Expireable[pd.DataFrame]

Expireable[FlyteFile]

and tasks could return their object wrapped in this object.

thankful-dress-89577

08/31/2022, 6:03 PM

As a workaround, we noticed we could probably use StructuredDataset’s uri field to control more explicitly where files end up in s3 and hence apply bucket lifecycle policies that would apply to those objects specifically (as a group, more so than the granularity I was suggesting would be ideal). In case someone else has a similar need that seems to be a viable approach if the data can be made into a StructuredDataset.

freezing-airport-6809

08/31/2022, 6:40 PM

You can control this in flytefile too

freezing-airport-6809

08/31/2022, 6:41 PM

Also you can set the raw output prefix per execution and all raw data will be put in that prefix only

thankful-dress-89577

08/31/2022, 6:42 PM

Oh? How do I set that in FlyteFile? I saw that you could set the prefix per execution, that can help too for some scenarios, but I am picturing say an intermediate dataset needing to be deleted but an output model artifact need to be retained, as part of the same workflow / execution.

freezing-airport-6809

08/31/2022, 6:43 PM

I would still then just copy the model to a different location as a step

thankful-dress-89577

08/31/2022, 6:45 PM

Sure, that is always an option, or conversely use manual file management for files needing to obey retention rules. Just was looking for some elegant way to solve it more generally 🙂

freezing-airport-6809

08/31/2022, 6:48 PM

ya, i think instead of manual file management, just copying the last bit over is nicer. it scales better. - basically the end prefix can be learnt

freezing-airport-6809

08/31/2022, 6:48 PM

cc @thankful-minister-83577- Flyte file custom path?

3 Views

Open in Slack

Previous Next