salmon-refrigerator-32115
01/05/2023, 7:50 PM@task(
container_image="<http://xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest|xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest>",
task_config=Spark(
spark_conf={...
}
),
)
def read_spark_df() -> pandas.DataFrame:
sess = flytekit.current_context().spark_session
spark_df = sess.read.parquet("<s3a://bucket/key.parquet>").toPandas()
df = pandas.DataFrame(spark_df)
return df
broad-monitor-993
01/05/2023, 8:02 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
salmon-refrigerator-32115
01/05/2023, 8:06 PMbroad-monitor-993
01/05/2023, 8:24 PM@task
and you can request for more resources, load the parquet file with pandas directly, and do whatever data processing you need in the same task itself.salmon-refrigerator-32115
01/05/2023, 9:18 PMbroad-monitor-993
01/05/2023, 10:52 PM