Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

<https://github.com/flyteorg/flyte/issues/3189|#3189 [BUG] Structured Dataset compatibility between plugins >
Issue created by <https://github.com/esadler-hbo|esadler-hbo>
*Describe the bug*

I was running some tasks in a notebooks where I was passing the results of a Spark task as a `StructuredDataset` and then trying to load them into a polars dataframe and a hugging face dataset.

It resulted in the following error for both plugins `No such file or directory: /var/folders/wq/3hjh3ms916b6dj56zx0f_x000000gq/T/flyte-69d2tww2/sandbox/local_flytekit/95bac8efeb64a8d10d34c73b66df7051/00000`. However, it did work for pandas.

It seems like polars and huggingface add in `00000` to the path in the transformers and spark does not.

• polars: <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-polars/flytekitplugins/polars/sd_transformers.py#L43|https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-polars/flytekitplugins/polars/sd_transformers.py#L43>
• spark: <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-spark/flytekitplugins/spark/sd_transformers.py#L29|https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-spark/flytekitplugins/spark/sd_transformers.py#L29>

*Expected behavior*

I would expect to be able to use a `StructuredDataset` from spark with dataframe libraries from all plugins.

*Additional context to reproduce*

from flytekit import task, StructuredDataset  
from flytekitplugins.spark.task import Spark  
from datasets import Dataset  
import polars as pl  
import datasets  
import pandas as pd

<https://github.com/task|@task>(  
task_config=Spark()  
)  
def spark_task(path: str) -&gt; StructuredDataset:  
sess = flytekit.current_context().spark_session  
df = sess.read.parquet(path)  
return StructuredDataset(dataframe=df)

df = spark_task(path="./ratings_100k.parquet")

try:  
df.open(pl.DataFrame).all().head()  
except Exception as e:  
print(e)

try:  
df.open(datasets.Dataset).all().head()  
except Exception as e:  
print(e)

df.open(pd.DataFrame).all().head()

*Screenshots*

<https://user-images.githubusercontent.com/97543480/209443256-59556baf-1e41-46e3-a557-26d120a2033b.png|Screen Shot 2022-12-24 at 10 54 40 AM>

*Are you sure this issue hasn't been raised already?*

☑︎ Yes

*Have you read the Code of Conduct?*

☑︎ Yes
<https://github.com/flyteorg/flyte|flyteorg/flyte>