However if I ran it locally it failed for ```java lang Runti Flyte #flyte-support

However, if I ran it locally, it failed for ```jav...

salmon-refrigerator-32115

01/06/2023, 6:49 PM

However, if I ran it locally, it failed for

Copy code

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)

freezing-airport-6809

01/06/2023, 6:58 PM

you have to add the hadoop jar for s3a

freezing-airport-6809

01/06/2023, 6:58 PM

please follow spark docs

salmon-refrigerator-32115

01/06/2023, 6:59 PM

@freezing-airport-6809, thanks, I will try.

salmon-refrigerator-32115

01/06/2023, 8:37 PM

@freezing-airport-6809 @glamorous-carpet-83516, I copied the hadoop-aws-2.7.3.jar and it’s compiled dependency aws-java-sdk-1.7.4.jar to the local spark install location where the other hadoop 2.7.3 jars were. However, I still get the same py4j error for class S3AFileSystem not found. I noticed this log line:

Copy code

23/01/05 15:15:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

salmon-refrigerator-32115

01/06/2023, 8:38 PM

It suggests to me that flyte is not even using my local spark, correct?

freezing-airport-6809

01/06/2023, 9:39 PM

hmm it is using pyspark locally only

salmon-refrigerator-32115

01/06/2023, 9:41 PM

@freezing-airport-6809 @glamorous-carpet-83516, thanks for confirming. How do I change that?

freezing-airport-6809

01/06/2023, 9:44 PM

what do you mean?

freezing-airport-6809

01/06/2023, 9:44 PM

pyspark will automatically start spark java i think

salmon-refrigerator-32115

01/06/2023, 9:46 PM

Where do you think I should add the hadoop jar for s3a? I’ve already added the missing jars in spark but it didn’t work.

glamorous-carpet-83516

01/06/2023, 9:51 PM

did you export

HADOOP_HOME

and

HADOOP_OPTS

? https://stackoverflow.com/a/24927214/9574775

salmon-refrigerator-32115

01/06/2023, 10:31 PM

@glamorous-carpet-83516, I don’t have $HADOOP_HOME set. Is that because I don’t have hadoop installed?

salmon-refrigerator-32115

01/06/2023, 10:33 PM

Do you suggest I install hadoop separately from spark?

glamorous-carpet-83516

01/06/2023, 10:41 PM

I’m not familiar with how spark works with hadoop, but it seems like the spark needs some hdfs dependencies to write data to hdfs.

171 Views

Open in Slack

Previous Next