<#364 add partition_columns to StructuredDatasetTy...
# flyte-github
a
#364 add partition_columns to StructuredDatasetType Pull request opened by cosmicBboy Signed-off-by: Niels Bantilan niels.bantilan@gmail.com Add
partition_columns
to
StructuredDatasetType
Partially addresses flyteorg/flyte#3219 TL;DR This PR adds an additional property to the
StructureDatasetType
protobuf definition so that metadata about which columns in the dataset (some kind of DataFrame object) are used for partitioning the dataset into chunks, for example when a
pandas.DataFrame
is serialized as a parquet file. Type ☐ Bug Fix ☑︎ Feature ☐ Plugin Are all requirements met? ☑︎ Code completed ☐ Smoke tested ☐ Unit tests added ☐ Code documentation added ☐ Any pending items have an associated Issue Complete description This change is required to store additional metadata about which columns are used for partitioning. Currently this only meaningfully affects the serialization/deserialization of parquet files, but in the future we could support the partitioning of other serialization formats. Tracking Issue Partly addresses flyteorg/flyte#3219 Follow-up issue NA flyteorg/flyteidl All checks have passed 13/13 successful checks