salmon-refrigerator-32115
12/15/2023, 9:21 PMRUN wget <https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.2/hadoop-aws-3.2.2.jar> -P /opt/spark/jars && \
wget <https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.262/aws-java-sdk-bundle-1.12.262.jar> -P /opt/spark/jars
And I set the spark config:
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
However, I found out the WebIdentityTokenCredentialsProvider is not being invoked by DefaultAWSCredentialsProviderChain.
And that’s probably because hadoop-aws didn’t add WebIdentityTokenCredentialsProvider to the AWSCredentialProviderList before passing that to DefaultAWSCredentialsProviderChain.
Unfortunately, our company’s EKS cluster is authenticating AWS via Web Identity Token.
Therefore the spark tasks cannot authenticate to AWS and not able to access AWS S3 to read data.
Does anyone encounter similar issues with WebIdentityTokenCredentialsProvider?
I tried to directly invoking the WebIdentityTokenCredentialsProvider like this:
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.WebIdentityTokenCredentialsProvider"
And it throws error: class WebIdentityTokenCredentialsProvider not found.
Does anyone have a solution?tall-lock-23197