<#3246 [Core feature] DuckDB Integration> Issue cr...
# flyte-github
a
#3246 [Core feature] DuckDB Integration Issue created by samhita-alla Motivation: Why do you think this is important? DuckDB integration can be helpful to run queries on files, dataframes, pyarrow tables in an efficient manner. A native integration with Flyte opens up a new set of possibilities to use DuckDB from within an orchestration platform, which is the first of its kind. Goal: What should the final outcome look like, ideally? A task plugin that enables users to run queries seamlessly; along the lines of the following prototype:
Copy code
duckdb_task = DuckDBQuery(name="duckdb_task", query="SELECT SUM(a) FROM mydf", inputs=kwtypes(mydf=pd.DataFrame))
Describe alternatives you've considered An alternative is to handle DuckDB code from within a Flyte task: https://gist.github.com/samhita-alla/003c3f409e8caa88470f6f7206b54ae3. Propose: Link/Inline OR Additional context A task plugin that accepts a query, a dataframe/pyarrow table/parquet file/csv file and parameters. Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte