Hello all. I am new to flyte. I have been trying ...
# flyte-support
g
Hello all. I am new to flyte. I have been trying to orchestrate my spark application. I have set things up on GKE. Every time I run a workflow for spark, I get the same error (even for the examples listed in the official documents) [1/1] currentAttempt done. Last Error: USER::Spark Job Submission Failed with Error: failed to run spark-submit: failed to run spark-submit: Exception in thread "main" org.apache.spark.SparkException: --deploy-mode must be either "client" or "cluster" I see that the sparkapplication that flyte creates does not have "mode: cluster" in the spec. When I edit it manually the application gets submitted. I have tried a lot of settings to default deployMode to cluster in flyte-propeller-config but none works. Moreover, I am also trying to edit the spark_conf with "spark.submit.deployMode": "cluster" but this doesn't work either. Any help will be appreciated!
f
Hey shubham What version are you using?
g
Hey Ketan. I am using 1.15.1 flyte-core.
f
Did you deploy Spark operator
a
@great-hair-77803 just to confirm, you already tried adding this to your values file?
Copy code
sparkoperator:
  enabled: true
  plugin_config:
    plugins:
      spark:
        spark-config-default:
          <the rest of your config>
          - spark.submit.deployMode = "cluster"
f
I dont think you should need deployMode
what version of spark operator did you deploy @great-hair-77803?
I am checking now. So SparkOperator does need deployMode here We dont pass the deployMode explicitly - here
NM, @average-finland-92144 pointed out that it is defaulted to
cluster
mode here
@great-hair-77803 can you simply drop this and run, it should work
we are running many jobs at scale with no problems
@great-hair-77803 here is my guess, you are running an newer version of spark operator. In spark operator 2.0, they broke this, please go to operator 1.1
g
Thank you guys, let me check these.
Thank you @freezing-airport-6809 @average-finland-92144 Downgrading spark-operator to 1.1.27 worked like a charm. I was wondering if we should fix this on our side for spark operator >2.x
f
Yes we should
Can you help with that - else we will get to it in a bit
a
hey @great-hair-77803 I couldn't reproduce this exact behavior but I wonder, what was the status of the
spark-operator-webhook
Deployment when you use
2.1.1
version of the operator? I ask because I ran a Spark job before and after upgrading from
1.1.27
to
2.1.1
and the only difference is that, if I don't specifiy a non-privileged port for the webhook, it goes crashlooping and the Spark job remains queued. If I include
--set webhook.port.9443
in the upgrade command, the webhook goes up and it injects that flag in the configmap the driver mounts:
Copy code
k describe cm spark-drv-980e7c966300a630-conf-map -n flytesnacks-development

...
spark.submit.deployMode=cluster
g
Hey @average-finland-92144 None of my pods in the spark-operator namespace were in crashloop state.