https://flyte.org logo
#ask-the-community
Title
# ask-the-community
f

Frank Shen

01/06/2023, 6:46 PM
I have the following spark task (read parquet data from S3) working in flyte remote (the flyte remote env have been setup to access S3 apparently).
Copy code
@task(
    container_image="<http://xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest|xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest>",
    task_config=Spark(
        spark_conf={...
        }
    ),
)
def read_spark_df() -> pandas.DataFrame:
    sess = flytekit.current_context().spark_session
    spark_df = sess.read.parquet("<s3a://bucket/key.parquet>").toPandas()
    ....
69 Views