Hi, what is the recommended way to share "somewhat...
# flyte-support
f
Hi, what is the recommended way to share "somewhat large but not huge" chunks of data between tasks in a single workflow ? In my case, I have a dict object of size about 4MB from the extract() task of an ETL pipeline. When I try to pass it on to the transform() task, I get
Copy code
RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: output file @[<s3://my-s3-bucket/metadata/propeller/etl-development-f4fdbe7cf84934ed8a0b/n0/data/0/outputs.pb>] is too large [4890760] bytes, max allowed [2097152] bytes
I'm using the sandbox environment on my local machine right now.
f
You can use flytefile or jsonl type
It will be auto offloaded
We are working to making offloading work automatically to have to never think about it
👍 1
f
Cool, so that would essentially mean a write_file() after every task and a read_file() at the beginning of the next task, right ?
f
No, just return a flytefile
And write a file Or use jsonl type
👍 1
g
f
Thanks. I was just wondering if there's an easier way, to avoid repeatedly reading from/writing to blob storage. We're running Flyte on a local cluster with somewhat constrained compute. I'm dealing with ETL on large JSON data, so right now I'm just returning a JSONLFile after every task and reading it in the subsequent task(s).
f
Is the task a creating the file and task be consuming the file
If so, to make it reproducible, we have to record the data somewhere
If this is one some node then you can mount the disk to every task and simply pass that as the raw output path. You won’t have to read / write