brief-family-9488
08/03/2025, 6:20 AMjolly-florist-66728
08/03/2025, 6:20 AMjolly-florist-66728
08/03/2025, 6:20 AMimport pandas as pd
from flytekit import task
from flytekit.types.file import FlyteFile
@task
def write_df_to_file() -> FlyteFile:
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
file_path = "output.csv"
df.to_csv(file_path, index=False)
return FlyteFile(path=file_path)
This approach is documented in the FlyteFile user guide and Flytesnacks examples. Would you like more detail or examples for other file formats?
Sources:
- https://github.com/flyteorg/flyte/issues/4241
- https://github.com/flyteorg/flyte/issues/3708 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.brief-family-9488
08/03/2025, 6:22 AMjolly-florist-66728
08/03/2025, 6:22 AMjolly-florist-66728
08/03/2025, 6:22 AMfrom flytekit import task
from flytekit.types.file import FlyteFile
@task
def write_spark_df_to_file() -> FlyteFile:
spark = flytekit.current_context().spark_session
df = spark.createDataFrame([("Alice", 1), ("Bob", 2)], ["name", "id"])
output_path = "output.parquet"
df.write.mode("overwrite").parquet(output_path)
return FlyteFile(path=output_path)
This method leverages Spark's native file writing and FlyteFile's automatic upload. For more advanced Spark integration, consider using StructuredDataset, which natively supports Spark DataFrames and Parquet serialization. Would you like more detail or an example using StructuredDataset?
Sources:
- https://github.com/flyteorg/flytesnacks/blob/master/examples/k8s_spark_plugin/k8s_spark_plugin/dataframe_passing.py
- https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-spark/flytekitplugins/spark/schema.py 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.