Frank Shen
01/05/2023, 7:50 PM@task(
container_image="<http://xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest|xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest>",
task_config=Spark(
spark_conf={...
}
),
)
def read_spark_df() -> pandas.DataFrame:
sess = flytekit.current_context().spark_session
spark_df = sess.read.parquet("<s3a://bucket/key.parquet>").toPandas()
df = pandas.DataFrame(spark_df)
return df
Niels Bantilan
01/05/2023, 8:02 PMYee
Frank Shen
01/05/2023, 8:06 PMNiels Bantilan
01/05/2023, 8:24 PM@task
and you can request for more resources: https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/customizing_resources.html#sphx-glr-auto-deployment-customizing-resources-pyFrank Shen
01/05/2023, 9:18 PMNiels Bantilan
01/05/2023, 10:52 PM