Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

<https://github.com/flyteorg/flyte/issues/3219|#3219 [Core feature] Default parquet-to-pandas encoder/decoder should support iterable read.>
Issue created by <https://github.com/cosmicBboy|cosmicBboy>
*Motivation: Why do you think this is important?*

The purpose of this issue is to support the use case where I can load `StructuredDataset`s iteratively as in:

```
structured_dataset.open(pd.Dataframe).iter()
```

*Goal: What should the final outcome look like, ideally?*

The end user should be able to specify the partition column when they output a structured dataset:

```
@task
def make_df() -&gt; StructuredDataset:
    df = pd.DataFrame.from_records([
        {
            "id": i,
            "partition": (i % 10) + 1,
            "name": "".join(
                random.choices(string.ascii_uppercase + string.digits, k=10)
            )
        }
        for i in range(1000)
    ])
    return StructuredDataset(dataframe=df, partition_cols=["partition"])  # or ["partition1", "partition2"]
```

And then consume it like so:

```
@task
def use_df(dataset: StructuredDataset) -&gt; pd.DataFrame:
    output = []
    for dd in dataset.open(pd.DataFrame).iter():
        print(f"This is a partial dataframe")
        print(dd.head(3))
        output.append(dd)
    return pd.concat(output)
```

*Describe alternatives you've considered*

The user needs to implement their own encoder/decoder for this use case.

*Propose: Link/Inline OR Additional context*

There is a working implementation of this here: <https://github.com/flyteorg/flyte-demos/blob/main/flyte_demo/workflows/data_iter.py#L101|https://github.com/flyteorg/flyte-demos/blob/main/flyte_demo/workflows/data_iter.py#L101>

*Are you sure this issue hasn't been raised already?*

☑︎ Yes

*Have you read the Code of Conduct?*

☑︎ Yes
<https://github.com/flyteorg/flyte|flyteorg/flyte>