salmon-refrigerator-32115
01/07/2023, 12:43 AMconf = SparkConf()
conf.set('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.3.2')
However, it failed in flyte with error:
23/01/06 16:42:33 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: Failed to connect to <http://nyxmmedina01741.wmad.warnermedia.com/10.217.173.85:55808|nyxmmedina01741.wmad.warnermedia.com/10.217.173.85:55808>
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.IOException: Failed to connect to <http://nyxmmedina01741.wmad.warnermedia.com/10.217.173.85:55808|nyxmmedina01741.wmad.warnermedia.com/10.217.173.85:55808>
freezing-airport-6809
freezing-airport-6809
salmon-refrigerator-32115
01/07/2023, 12:47 AMsalmon-refrigerator-32115
01/07/2023, 12:47 AMthankful-minister-83577
thankful-minister-83577
freezing-airport-6809
salmon-refrigerator-32115
01/07/2023, 1:01 AMsalmon-refrigerator-32115
01/07/2023, 1:03 AMconf = SparkConf()
flyte manages spark context differently.
flytekit.current_context().spark_session
salmon-refrigerator-32115
01/07/2023, 1:03 AMfreezing-airport-6809
salmon-refrigerator-32115
01/07/2023, 1:21 AMflytekit.current_context()
thankful-minister-83577
thankful-minister-83577
user_params.builder().add_attr
is what makes it available in the object returned by current_context()tall-lock-23197
export SPARK_LOCAL_IP="127.0.0.1"
salmon-refrigerator-32115
01/09/2023, 6:24 PMsalmon-refrigerator-32115
01/09/2023, 6:27 PMadmin:
endpoint: dns:///flyte.dev.dap.warnermedia.com
SPARK_LOCAL_IP:
127.0.0.1
salmon-refrigerator-32115
01/10/2023, 12:13 AM@task(
task_config=Spark(
spark_conf={
...
# The following is needed only when running spark task in dev's local PC. Also need to do this locally: export SPARK_LOCAL_IP="127.0.0.1"
"spark.jars.packages": "org.apache.hadoop:hadoop-aws:3.3.2",
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider",
"spark.hadoop.fs.s3a.access.key": "",
"spark.hadoop.fs.s3a.secret.key": "",
"spark.hadoop.fs.s3a.session.token": "",
},
glamorous-carpet-83516
01/10/2023, 12:46 AMtall-lock-23197
tall-lock-23197
can I add SPARK_LOCAL_IP to .flyte/config.yaml?I don't think so. You can, however, add it to the bash or zsh profile since it's an env variable.
salmon-refrigerator-32115
01/10/2023, 6:23 PMglamorous-carpet-83516
01/10/2023, 6:26 PMsalmon-refrigerator-32115
01/10/2023, 6:26 PMsalmon-refrigerator-32115
01/10/2023, 6:27 PMglamorous-carpet-83516
01/10/2023, 6:28 PMspark-defaults.conf
, and add it to env. pyspark will use default config in it.
https://stackoverflow.com/a/71214326/9574775salmon-refrigerator-32115
01/10/2023, 6:34 PMThe spark-defaults.conf file should be located in:
$SPARK_HOME/conf
salmon-refrigerator-32115
01/10/2023, 6:35 PMglamorous-carpet-83516
01/10/2023, 6:40 PMsalmon-refrigerator-32115
01/10/2023, 6:46 PMsalmon-refrigerator-32115
01/10/2023, 6:46 PMglamorous-carpet-83516
01/10/2023, 6:49 PMsalmon-refrigerator-32115
01/10/2023, 6:49 PMsalmon-refrigerator-32115
01/10/2023, 6:56 PMsalmon-refrigerator-32115
01/10/2023, 6:56 PMfreezing-airport-6809
salmon-refrigerator-32115
01/11/2023, 5:37 PMglamorous-carpet-83516
01/11/2023, 6:45 PMglamorous-carpet-83516
01/11/2023, 6:45 PMsalmon-refrigerator-32115
01/11/2023, 6:59 PMsalmon-refrigerator-32115
01/11/2023, 6:59 PMthankful-minister-83577
thankful-minister-83577
salmon-refrigerator-32115
01/11/2023, 7:41 PM