acoustic-nest-94594
09/17/2024, 10:34 PMtolerations
to the driver/executor pods that get launched via the Flyte Spark plugin and I must be missing something on how this works; the relevant section of my configuration is in the ๐งต, and I think I'm reading the relevant bits of the Spark plugin correctly, but for whatever reason my tolerations aren't making the leap from the configuration to the pods. Any help from folks who have figured this out before would be very much appreciated! ๐acoustic-nest-94594
09/17/2024, 10:35 PMflyte-backend
looks like this:
plugins:
k8s:
default-env-vars:
- AWS_METADATA_SERVICE_TIMEOUT: 5
- AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
default-tolerations:
- effect: NoSchedule
key: datology-job-type
operator: Exists
inject-finalizer: true
spark:
spark-config-default:
- spark.eventLog.enabled: "true"
- spark.eventLog.dir: <s3a://dev-datologyai-job-logs/dev-next-spark-operator-logs>
- spark.eventLog.rolling.enabled: "true"
- spark.eventLog.rolling.maxFileSize: 16m
- spark.kubernetes.authenticate.submission.caCertFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- spark.kubernetes.authenticate.submission.oauthTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
- spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
- spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.driver.extraJavaOptions: -Divy.cache.dir=/tmp -Divy.home=/tmp
storage:
cache:
max_size_mbs: 100
target_gc_percent: 100
acoustic-nest-94594
09/17/2024, 10:36 PMfreezing-airport-6809
freezing-airport-6809
average-finland-92144
09/18/2024, 5:20 PMdefault-tolerations
would help bc it injects tols into Pods spawned by the propeller K8s plugin, and Spark is a different one. The only relevant section I see is if you could use `plugins.spark.spark-config-default`to set the tolerations that the operator ends up applying to the Driver/executor Pods. At the spark-operator Helm chart level I can only see tolerations for the controller itself.
From the operator API docs, it doesn't seem that tolerations are even configurable for the Driver/Executor but I may be wrong.acoustic-nest-94594
09/18/2024, 6:29 PMspark.kubernetes.{driver/executor}.podTemplateFile
property in those spark configs to create a pod template that includes the tolerations, I was just surprised b/c it looked like the Flyte spark.go
code was using those k8s.default-tolerations
(and the other settings for the pods under k8s
) to setup the default podspec that was getting passed in to the createSparkPodSpec
function from e.g. here: https://github.com/flyteorg/flyte/blob/master/flyteplugins/go/tasks/plugins/k8s/spark/spark.go#L177acoustic-nest-94594
09/19/2024, 9:50 PMfreezing-airport-6809
freezing-airport-6809