acoustic-carpenter-78188
02/10/2023, 4:55 PMpartition_columns to StructuredDatasetType
Partially addresses flyteorg/flyte#3219
TL;DR
This PR adds an additional property to the StructureDatasetType protobuf definition so that metadata about which columns in the dataset (some kind of DataFrame object) are used for partitioning the dataset into chunks, for example when a pandas.DataFrame is serialized as a parquet file.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☐ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
This change is required to store additional metadata about which columns are used for partitioning. Currently this only meaningfully affects the serialization/deserialization of parquet files, but in the future we could support the partitioning of other serialization formats.
Tracking Issue
Partly addresses flyteorg/flyte#3219
Follow-up issue
NA
flyteorg/flyteidl
✅ All checks have passed
13/13 successful checks