However, if I ran it locally, it failed for ```jav...
# ask-the-community
f
However, if I ran it locally, it failed for
Copy code
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
k
you have to add the hadoop jar for s3a
please follow spark docs
f
@Ketan (kumare3), thanks, I will try.
@Ketan (kumare3) @Kevin Su, I copied the hadoop-aws-2.7.3.jar and it’s compiled dependency aws-java-sdk-1.7.4.jar to the local spark install location where the other hadoop 2.7.3 jars were. However, I still get the same py4j error for class S3AFileSystem not found. I noticed this log line:
Copy code
23/01/05 15:15:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It suggests to me that flyte is not even using my local spark, correct?
k
hmm it is using pyspark locally only
f
@Ketan (kumare3) @Kevin Su, thanks for confirming. How do I change that?
k
what do you mean?
pyspark will automatically start spark java i think
f
Where do you think I should add the hadoop jar for s3a? I’ve already added the missing jars in spark.
k
did you export
HADOOP_HOME
and
HADOOP_OPTS
? https://stackoverflow.com/a/24927214/9574775
f
@Kevin Su, I don’t have $HADOOP_HOME set. Is that because I don’t have hadoop installed?
Do you suggest I install hadoop separately from spark?
k
I’m not familiar with how spark works with hadoop, but it seems like the spark needs some hdfs dependencies to write data to hdfs.
119 Views