I have the following spark task (read parquet data...
# ask-the-community
f
I have the following spark task (read parquet data from S3) working in flyte remote (the flyte remote env have been setup to access S3 apparently).
Copy code
@task(
    container_image="<http://xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest|xyz.dkr.ecr.us-east-1.amazonaws.com/flyte-pyspark:latest>",
    task_config=Spark(
        spark_conf={...
        }
    ),
)
def read_spark_df() -> pandas.DataFrame:
    sess = flytekit.current_context().spark_session
    spark_df = sess.read.parquet("<s3a://bucket/key.parquet>").toPandas()
    ....
153 Views