Hi Flyte team, I built an image referencing <https...
# flyte-support
m
Hi Flyte team, I built an image referencing https://github.com/flyteorg/flytesnacks/blob/c9b2c44dc5d0ce42e482be1ffb16c8b19e72c23c/examples/k8s_spark_plugin/Dockerfile, and right now running this on GKE. The logs is showing that the executor and driver is showing some communications problems, the apps stuck at start up. I attached the log. Just want to know what do you think the problem is. Thanks
cc @average-finland-92144
spark-log-0513.txt
s
@high-park-82026 @average-finland-92144, if you know this off the top of your head, could you please point us to the relevant resources or next steps? Thank you!
a
@mammoth-quill-44336 I'd recommend starting with the example (including ImageSpec) to validate it works in your environment https://www.union.ai/docs/flyte/integrations/native-backend-plugins/k8s-spark-plugin/pyspark-pi/
There are a few ways to customize the settings of driver and executor Pods but we can start from there
m
okay thanks David
Hi @average-finland-92144 it worked
right now I need to install some JVM based jars
could you advise
I was basically modifying the Dockerfile I shared and customized on top of it
Hi @average-finland-92144 i worked around it and I made it work, I do have another problem, when I set executor number to 4, it always only launched 2 , and driver is throwing error like this, do you know why?
btw the job is just going with 2 executors, and throwing that error at the same time
my code is:
Copy code
task_config=Spark(
        # This configuration is applied to the Spark cluster
        spark_conf={
            "spark.driver.memory": "6000M",
            "spark.executor.memory": "40000M",
            "spark.executor.cores": "4",
            "spark.executor.instances": "8",
            "spark.driver.cores": "1",
            "spark.jars": "<https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar,https://repo1.maven.org/maven2/com/google/cloud/spark/spark-bigquery-with-dependencies_2.12/0.32.0/spark-bigquery-with-dependencies_2.12-0.32.0.jar>",
            "spark.kubernetes.authenticate.driver.serviceAccountName": "default",
            "spark.kubernetes.authenticate.executor.serviceAccountName": "default",
        }
    ),
a
seems like to log screenshot is truncated. Could you share the logs maybe in text?
m
yes
oh yes the truncated part informed the reason
Copy code
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: <https://kubernetes.default.svc/api/v1/namespaces/flytesnacks-development/pods>. Message: Forbidden!Configured se │
│ rvice account doesn't have access. Service account may have been revoked. pods "acjklbj42p47kcf5mfhs-n0-0-exec-378" is forbidden: exceeded quota: project-quota, requested: limits.cpu=4, used: lim │
│ its.cpu=9, limited: limits.cpu=10.
I'll check the flyte config
a
so there seem to be two sources of issues: • Service account perms • Project-quota limits Can you check the following? 1. The spark service account (
kubectl describe sa spark -n flytesnacks-development
) 2. The Spark RBAC config: a.
kubectl get clusterrole
b.
kubectl get clusterrolebinding
Also
kubectl get resourcequota -n flytesnacks-development
m
service account should have access, cuz when i set it as 2, it doesn't have error
I'm about to test again
a
oh, so it's more the resourceQuota
m
yes, it's working as expected now
just needed to expand the log
lol
thanks David