https://flyte.org logo
#ask-the-community
Title
# ask-the-community
f

Frank Shen

01/06/2023, 6:49 PM
However, if I ran it locally, it failed for
Copy code
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
k

Ketan (kumare3)

01/06/2023, 6:58 PM
you have to add the hadoop jar for s3a
please follow spark docs
f

Frank Shen

01/06/2023, 6:59 PM
@Ketan (kumare3), thanks, I will try.
@Ketan (kumare3) @Kevin Su, I copied the hadoop-aws-2.7.3.jar and it’s compiled dependency aws-java-sdk-1.7.4.jar to the local spark install location where the other hadoop 2.7.3 jars were. However, I still get the same py4j error for class S3AFileSystem not found. I noticed this log line:
Copy code
23/01/05 15:15:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It suggests to me that flyte is not even using my local spark, correct?
k

Ketan (kumare3)

01/06/2023, 9:39 PM
hmm it is using pyspark locally only
f

Frank Shen

01/06/2023, 9:41 PM
@Ketan (kumare3) @Kevin Su, thanks for confirming. How do I change that?
k

Ketan (kumare3)

01/06/2023, 9:44 PM
what do you mean?
pyspark will automatically start spark java i think
f

Frank Shen

01/06/2023, 9:46 PM
Where do you think I should add the hadoop jar for s3a? I’ve already added the missing jars in spark.
k

Kevin Su

01/06/2023, 9:51 PM
did you export
HADOOP_HOME
and
HADOOP_OPTS
? https://stackoverflow.com/a/24927214/9574775
f

Frank Shen

01/06/2023, 10:31 PM
@Kevin Su, I don’t have $HADOOP_HOME set. Is that because I don’t have hadoop installed?
Do you suggest I install hadoop separately from spark?
k

Kevin Su

01/06/2023, 10:41 PM
I’m not familiar with how spark works with hadoop, but it seems like the spark needs some hdfs dependencies to write data to hdfs.
4 Views