adventurous-napkin-36518
03/07/2024, 10:06 AMaverage-finland-92144
03/07/2024, 3:53 PMdeploy-flyte then it's flyte-core. You should then add the config specified in the docs for flyte-core to your values-gcp-core.yaml file and upgrade your Helm deployment (just running terraform apply)adventurous-napkin-36518
03/07/2024, 4:01 PMaverage-finland-92144
03/07/2024, 4:02 PMadventurous-napkin-36518
03/07/2024, 4:04 PMaverage-finland-92144
03/07/2024, 4:06 PMspark-config-default has some keys that are AWS-specificadventurous-napkin-36518
03/07/2024, 4:06 PMaverage-finland-92144
03/07/2024, 4:26 PMspark-config-default:
- spark.eventLog.enabled: "true"
- spark.eventLog.dir: "{{ Values.userSettings.bucketName }}/spark-events"
- spark.driver.cores: "1"
- spark.executorEnv.HTTP2_DISABLE: "true"
- spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
- spark.kubernetes.allocation.batch.size: "50"
- spark.kubernetes.driverEnv.HTTP2_DISABLE: "true"
- spark.network.timeout: 600s
- spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
- spark.executor.heartbeatInterval: 60sadventurous-napkin-36518
03/07/2024, 4:41 PMadventurous-napkin-36518
03/07/2024, 4:41 PM│ Error: template: flyte-core/templates/propeller/webhook.yaml:33:27: executing "flyte-core/templates/propeller/webhook.yaml" at <include (print .Template.BasePath "/propeller/configmap.yaml") .>: error calling include: template: flyte-core/templates/propeller/configmap.yaml:41:19: executing "flyte-core/templates/propeller/configmap.yaml" at <tpl (toYaml .) $>: error calling tpl: error during tpl function execution for "plugins:\n spark:\n spark-config-default:\n - spark.eventLog.enabled: \"true\"\n - spark.eventLog.dir: '{{ Values.userSettings.bucketName }}/spark-events'\n - spark.driver.cores: \"1\"\n - spark.executorEnv.HTTP2_DISABLE: \"true\"\n - spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS\n - spark.kubernetes.allocation.batch.size: \"50\"\n - spark.kubernetes.driverEnv.HTTP2_DISABLE: \"true\"\n - spark.network.timeout: 600s\n - spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000\n - spark.executor.heartbeatInterval: 60s": parse error at (flyte-core/templates/propeller/webhook.yaml:5): function "Values" not defined
│adventurous-napkin-36518
03/07/2024, 4:41 PMaverage-finland-92144
03/07/2024, 4:43 PM.Values.userSettings.bucketNameadventurous-napkin-36518
03/07/2024, 4:44 PMadventurous-napkin-36518
03/07/2024, 4:53 PMforbidden, Reason: "IAM", UserMessage: "Unable to generate access token; IAM returned 403 Forbidden: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).\nThis error could be caused by a missing IAM policy binding on the target IAM service account.\nFor more information, refer to the Workload Identity documentation:\n\t<https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to>\n", started at 2024-03-07 16:51:08.984515318 +0000 UTC m=+183187.278580063adventurous-napkin-36518
03/07/2024, 4:54 PM[conn-id:22c33a1d8d8ce5a9 ip:172.16.0.61 pod:flytesnacks-development/fef289c19c0fb4703b69-n0-0-driver rpc-id:3396cac83aeffbf5] "/computeMetadata/v1/instance/service-accounts/flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control" HTTP/403: generic::permission_denied: loading: GenerateAccessToken("flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com", ""): googleapi: Error 403: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).average-finland-92144
03/07/2024, 4:55 PMadventurous-napkin-36518
03/07/2024, 4:56 PMaverage-finland-92144
03/07/2024, 4:57 PMadventurous-napkin-36518
03/07/2024, 4:58 PMadventurous-napkin-36518
03/07/2024, 4:59 PMaverage-finland-92144
03/07/2024, 5:18 PMspark Service Account?average-finland-92144
03/07/2024, 5:19 PMpyflyte run --remote <your-workflow> --service-account=spark?adventurous-napkin-36518
03/07/2024, 5:20 PMpyflyte run --remote pyspark_pi.py my_sparkadventurous-napkin-36518
03/07/2024, 5:20 PMaverage-finland-92144
03/07/2024, 5:20 PMadventurous-napkin-36518
03/07/2024, 5:21 PMaverage-finland-92144
03/07/2024, 5:23 PM--remote
like
pyflyte run --remote --service-account=spark ...adventurous-napkin-36518
03/07/2024, 5:27 PMaverage-finland-92144
03/07/2024, 5:32 PMspark SA has the IAM role annotation?
kubectl describe sa spark -n <spark-operator-namespace>adventurous-napkin-36518
03/07/2024, 5:36 PMaverage-finland-92144
03/07/2024, 5:37 PMflyte on deploy-flyte
so
kubectl describe sa spark -n flyteadventurous-napkin-36518
03/07/2024, 5:38 PMspark-operator namespace where as flyte is in flyte namespace.adventurous-napkin-36518
03/07/2024, 5:40 PMError from server (NotFound): serviceaccounts "spark" not foundaverage-finland-92144
03/07/2024, 5:42 PMkubectl get sa -n spark-operatoradventurous-napkin-36518
03/07/2024, 5:43 PMadventurous-napkin-36518
03/07/2024, 5:46 PMvalues-core-gcp.yaml looks like this https://gist.github.com/jegadesh-google/be2a44026525cb41a868462cb1cf384b and i used the below commands to setup the spark operator
helm repo add spark-operator <https://googlecloudplatform.github.io/spark-on-k8s-operator>
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespaceaverage-finland-92144
03/07/2024, 6:07 PM<http://iam.tf|iam.tf> module performs.
One last check, isn't there a spark SA on the flyte ns?
kubect get sa -n flyteadventurous-napkin-36518
03/08/2024, 1:59 AMspark sa.average-finland-92144
03/08/2024, 3:38 PMadventurous-napkin-36518
03/08/2024, 11:21 PMadventurous-napkin-36518
03/11/2024, 8:36 AMaverage-finland-92144
03/11/2024, 10:32 PMadventurous-napkin-36518
03/11/2024, 11:25 PMadventurous-napkin-36518
03/12/2024, 3:19 PMiam.serviceAccounts.getAccessToken but even after doing that, i am facing the same issue..average-finland-92144
03/12/2024, 9:14 PM<http://flyte.tf|flyte.tf> to reference the new values file (optional just in case you want to keep this config on a different file)
3. Run terraform apply and verify that a spark SA is created on the flytesnacks-development namespace, verify it's annotated with the GSA and verify there's a spark-role role created in the same namespace.
4. Added to <http://iam.tf|iam.tf> the "spark" SA to this array:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L2
3. Added the `iam.serviceAccounts.signBlob`permission to this role:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L80
4. Save and terraform apply
5. I had to complete the steps to use Artifact Registry and specify my repo name (like registry=<my-repo>) here
AS you may have noticed I removed the ResourceQuota bc the scheduler will fail admission if there are not either requests or limits and, that's another effort: to profile what are the recommended base resources.
With this, I don't have any permission issues as of now. I'm getting an annoying No Module found type of error but that's probably on the user side.
Please let me know if this works for you as an update to the docs is neededadventurous-napkin-36518
03/13/2024, 12:13 PM