shy-room-84005
02/09/2023, 9:07 AMAn error occurred while calling o125.parquet.
: java.nio.file.AccessDeniedException: s3://<bucket>/<path>: getFileStatus on s3://<bucket>/<path>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;
My spark config in values.yaml is:
spark-config-default:
# We override the default credentials chain provider for Hadoop so that
# it can use the serviceAccount based IAM role or ec2 metadata based.
# This is more in line with how AWS works
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
- spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
- spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.multipart.threshold: "536870912"
- spark.blacklist.enabled: "true"
- spark.blacklist.timeout: "5m"
- spark.task.maxfailures: "8"
I can specify the AWS credentials manually and use "spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" in the spark config, but I was wondering if you can use IRSA to avoid using credentials.average-finland-92144
02/09/2023, 6:28 PMglamorous-carpet-83516
02/09/2023, 6:59 PMdef t1():
sd = StructuredDataset(uri="<s3://bucket/key>")
df = sd.open(pyspark.DataFrame).all()