brief-family-9488
08/05/2025, 5:10 AMjolly-florist-66728
08/05/2025, 5:10 AMjolly-florist-66728
08/05/2025, 5:10 AM.open()
method and pass the file handle to a dataframe reader such as pandas.read_csv()
. For example:
import pandas as pd
from flytekit.types.file import FlyteFile
@task
def read_flytefile(ff: FlyteFile) -> pd.DataFrame:
with ff.open("r") as f:
df = pd.read_csv(f)
return df
This approach is shown in the Flyte documentation and examples, including streaming support for large files. See file_streaming.py and FlyteFile docs.
Would you like more detail or examples for other file formats?
Sources:
- file_streaming.py
- FlyteFile docs 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.brief-family-9488
08/05/2025, 5:13 AMjolly-florist-66728
08/05/2025, 5:13 AMjolly-florist-66728
08/05/2025, 5:14 AMfrom flytekit.types.file import FlyteFile
from flytekit import task
import flytekit
@task
def read_flytefile_spark(ff: FlyteFile):
spark = flytekit.current_context().spark_session
df = spark.read.parquet(ff.path) # or .csv(ff.path) for CSV files
return df
This method leverages the FlyteFile's .path
attribute, which provides the local file path for Spark to read. See dataframe_passing.py for similar usage.
Would you like more detail or examples for other formats?
Sources:
- dataframe_passing.py 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.brief-family-9488
08/05/2025, 5:23 AMpyspark.errors.exceptions.connect.AnalysisException: [PATH_NOT_FOUND] Path does not exist:
jolly-florist-66728
08/05/2025, 5:23 AMjolly-florist-66728
08/05/2025, 5:23 AM.download()
method to materialize the file:
local_path = ff.download()
df = spark.read.parquet(local_path) # or .csv(local_path)
If you use ff.path
directly, the file may not be present unless it was already downloaded. See FlyteFile docs for details.
Would you like more troubleshooting steps or explanation?
Sources:
- FlyteFile docs 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.