<#3189 [BUG] Structured Dataset compatibility betw...
# flyte-github
a
#3189 [BUG] Structured Dataset compatibility between plugins Issue created by esadler-hbo Describe the bug I was running some tasks in a notebooks where I was passing the results of a Spark task as a
StructuredDataset
and then trying to load them into a polars dataframe and a hugging face dataset. It resulted in the following error for both plugins
No such file or directory: /var/folders/wq/3hjh3ms916b6dj56zx0f_x000000gq/T/flyte-69d2tww2/sandbox/local_flytekit/95bac8efeb64a8d10d34c73b66df7051/00000
. However, it did work for pandas. It seems like polars and huggingface add in
00000
to the path in the transformers and spark does not. • polars: https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-polars/flytekitplugins/polars/sd_transformers.py#L43 • spark: https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-spark/flytekitplugins/spark/sd_transformers.py#L29 Expected behavior I would expect to be able to use a
StructuredDataset
from spark with dataframe libraries from all plugins. Additional context to reproduce from flytekit import task, StructuredDataset from flytekitplugins.spark.task import Spark from datasets import Dataset import polars as pl import datasets import pandas as pd @task( task_config=Spark() ) def spark_task(path: str) -> StructuredDataset: sess = flytekit.current_context().spark_session df = sess.read.parquet(path) return StructuredDataset(dataframe=df) df = spark_task(path="./ratings_100k.parquet") try: df.open(pl.DataFrame).all().head() except Exception as e: print(e) try: df.open(datasets.Dataset).all().head() except Exception as e: print(e) df.open(pd.DataFrame).all().head() Screenshots

Screen Shot 2022-12-24 at 10 54 40 AM

Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte