Hello I m at a point with flyte now where I m asking myself Flyte #flyte-support

Hello, I'm at a point with flyte now where I'm ask...

some-grass-84903

03/02/2023, 1:34 PM

Hello, I'm at a point with flyte now where I'm asking myself how bigger data (talking GBs) is passed efficiently between tasks. I cannot imagine that if I have a setup on e.g. AWS that those bigger output/input data is synced with S3 all the time. Or am I missing something? What would be the way to exchange big data between tasks inside a workflow without up and downloading it on every task again? I can imagine that it could be shared over a mounted volume from the k8s cluster, but that would probably interfere with the caching mechanism at some point, right?

freezing-airport-6809

03/02/2023, 3:05 PM

To be honest it might look like premature optimization. How big are you talking about By the way the entire data subsystem is getting a refresh. Once this https://github.com/flyteorg/flytekit/pull/1512 you will be able to stream data You should not need to download and upload things unless you transform it. And finally you can use efs/lustre fsx or shared volumes using pod templates

freezing-airport-6809

03/02/2023, 3:06 PM

But @some-grass-84903 would you open to a chat - we would love to understand what you are seeing and how we can make this even better. We are working on cool things, we want to make it simple yet efficient and correct

some-grass-84903

03/03/2023, 8:32 AM

Thanks for the feedback! I considered that the volume binding would be the way to go in my scenario. About a more concrete scenario, I will come back to you once I'm deeper into the topic.

🔥 1

freezing-airport-6809

03/03/2023, 2:55 PM

Yes please this will help

freezing-airport-6809

03/03/2023, 2:55 PM

Let’s make Flyte make it possible for users to keep it simple

184 Views

Open in Slack

Previous Next