Hello, When using spark-operator with Flyte in the...
# ask-the-community
f
Hello, When using spark-operator with Flyte in the EKS cluster, I built the custom image with the following versions of jars.
Copy code
RUN wget <https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.2/hadoop-aws-3.2.2.jar> -P /opt/spark/jars && \
    wget <https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.262/aws-java-sdk-bundle-1.12.262.jar> -P /opt/spark/jars
And I set the spark config:
Copy code
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
However, I found out the WebIdentityTokenCredentialsProvider is not being invoked by DefaultAWSCredentialsProviderChain. And that’s probably because hadoop-aws didn’t add WebIdentityTokenCredentialsProvider to the AWSCredentialProviderList before passing that to DefaultAWSCredentialsProviderChain. Unfortunately, our company’s EKS cluster is authenticating AWS via Web Identity Token. Therefore the spark tasks cannot authenticate to AWS and not able to access AWS S3 to read data. Does anyone encounter similar issues with WebIdentityTokenCredentialsProvider? I tried to directly invoking the WebIdentityTokenCredentialsProvider like this:
Copy code
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.WebIdentityTokenCredentialsProvider"
And it throws error: class WebIdentityTokenCredentialsProvider not found. Does anyone have a solution?
s
can you use the latest version of hadoop-aws? see this: https://stackoverflow.com/a/74302330