<#3037 [Core feature] [Flytekit] Add support for H...
# flytekit
a
#3037 [Core feature] [Flytekit] Add support for HDF5 and Arrow in flyteplugins-vaex Issue created by ryankarlos Motivation: Why do you think this is important? Currently
flyteplugins-vaex
supports automatic serialization and deserialization of vaex dataframe between consecutive tasks using parquet flyteorg/flytekit#1230 It would be good to extend this to HDF5 and arrow for performance and interoperability, when data sets are too large to fit into memory https://vaex.readthedocs.io/en/latest/faq.html#What-is-the-optimal-file-format-to-use-with-vaex Goal: What should the final outcome look like, ideally? Register extra handlers
VaexDataFrameToHDF5EncodingHandler
and
VaexDataFrameToArrowEncodingHandler
, so users can use
Annotated
to update the default format:
Copy code
@task
def t1(f: vaex.dataframe.DataFrameLocal) -> Annotated[StructuredDataset, HDF5]

@task
def t2(f: vaex.dataframe.DataFrameLocal) -> Annotated[StructuredDataset, Arrow]
Describe alternatives you've considered N/A Propose: Link/Inline OR Additional context See discussion thread here flyteorg/flytekit#1230 (comment) Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte