Hi what is the recommended way to share somewhat large but n Flyte #flyte-support

Hi, what is the recommended way to share "somewhat...

future-room-59953

07/22/2024, 2:37 PM

Hi, what is the recommended way to share "somewhat large but not huge" chunks of data between tasks in a single workflow ? In my case, I have a dict object of size about 4MB from the extract() task of an ETL pipeline. When I try to pass it on to the transform() task, I get

Copy code

RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: output file @[<s3://my-s3-bucket/metadata/propeller/etl-development-f4fdbe7cf84934ed8a0b/n0/data/0/outputs.pb>] is too large [4890760] bytes, max allowed [2097152] bytes

I'm using the sandbox environment on my local machine right now.

freezing-airport-6809

07/22/2024, 2:38 PM

You can use flytefile or jsonl type

freezing-airport-6809

07/22/2024, 2:38 PM

It will be auto offloaded

freezing-airport-6809

07/22/2024, 2:38 PM

We are working to making offloading work automatically to have to never think about it

👍 1

future-room-59953

07/22/2024, 2:39 PM

Cool, so that would essentially mean a write_file() after every task and a read_file() at the beginning of the next task, right ?

freezing-airport-6809

07/22/2024, 2:39 PM

No, just return a flytefile

freezing-airport-6809

07/22/2024, 2:40 PM

And write a file Or use jsonl type

👍 1

glamorous-carpet-83516

07/22/2024, 5:32 PM

check out this example https://docs.flyte.org/en/latest/user_guide/data_types_and_io/flytefile.html

future-room-59953

07/23/2024, 4:25 PM

Thanks. I was just wondering if there's an easier way, to avoid repeatedly reading from/writing to blob storage. We're running Flyte on a local cluster with somewhat constrained compute. I'm dealing with ETL on large JSON data, so right now I'm just returning a JSONLFile after every task and reading it in the subsequent task(s).

freezing-airport-6809

07/24/2024, 3:08 AM

Is the task a creating the file and task be consuming the file

freezing-airport-6809

07/24/2024, 3:08 AM

If so, to make it reproducible, we have to record the data somewhere

freezing-airport-6809

07/24/2024, 3:45 AM

If this is one some node then you can mount the disk to every task and simply pass that as the raw output path. You won’t have to read / write

10 Views

Open in Slack

Previous Next