Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Hi! Is there any way to avoid the extra read and write if wanting to pass a structureddataset to a task that already exists on S3? Ie can I create the structureddataset without having a task that effectively does a copy?

Yes - use structereddataset input and simply return it as output without open

I got this to work for parquet datasets written by spark, but not for json and csv datasets from spark with
```@task
def sd_creator(): -&gt; StructuredDataset
   return StructuredDataset(uri="<s3a://my_bucket/path/to/json/>")

@task
def sd_worker(sd_json):
   df_json.open(DataFrame).all()...```
as it still attempts to read with parquet (even if specifying file_format=“json” to StructuredDataset). Do you have a working example for this? I did get it to work with an explicit read with spark and return StructuredDataset without uri, but it adds overhead.