adventurous-napkin-36518
03/07/2024, 10:06 AMaverage-finland-92144
03/07/2024, 3:53 PMdeploy-flyte
then it's flyte-core
. You should then add the config specified in the docs for flyte-core to your values-gcp-core.yaml
file and upgrade your Helm deployment (just running terraform apply
)adventurous-napkin-36518
03/07/2024, 4:01 PMaverage-finland-92144
03/07/2024, 4:02 PMadventurous-napkin-36518
03/07/2024, 4:04 PMaverage-finland-92144
03/07/2024, 4:06 PMspark-config-default
has some keys that are AWS-specificadventurous-napkin-36518
03/07/2024, 4:06 PMaverage-finland-92144
03/07/2024, 4:26 PMspark-config-default:
- spark.eventLog.enabled: "true"
- spark.eventLog.dir: "{{ Values.userSettings.bucketName }}/spark-events"
- spark.driver.cores: "1"
- spark.executorEnv.HTTP2_DISABLE: "true"
- spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
- spark.kubernetes.allocation.batch.size: "50"
- spark.kubernetes.driverEnv.HTTP2_DISABLE: "true"
- spark.network.timeout: 600s
- spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
- spark.executor.heartbeatInterval: 60s
adventurous-napkin-36518
03/07/2024, 4:41 PMadventurous-napkin-36518
03/07/2024, 4:41 PM│ Error: template: flyte-core/templates/propeller/webhook.yaml:33:27: executing "flyte-core/templates/propeller/webhook.yaml" at <include (print .Template.BasePath "/propeller/configmap.yaml") .>: error calling include: template: flyte-core/templates/propeller/configmap.yaml:41:19: executing "flyte-core/templates/propeller/configmap.yaml" at <tpl (toYaml .) $>: error calling tpl: error during tpl function execution for "plugins:\n spark:\n spark-config-default:\n - spark.eventLog.enabled: \"true\"\n - spark.eventLog.dir: '{{ Values.userSettings.bucketName }}/spark-events'\n - spark.driver.cores: \"1\"\n - spark.executorEnv.HTTP2_DISABLE: \"true\"\n - spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS\n - spark.kubernetes.allocation.batch.size: \"50\"\n - spark.kubernetes.driverEnv.HTTP2_DISABLE: \"true\"\n - spark.network.timeout: 600s\n - spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000\n - spark.executor.heartbeatInterval: 60s": parse error at (flyte-core/templates/propeller/webhook.yaml:5): function "Values" not defined
│
adventurous-napkin-36518
03/07/2024, 4:41 PMaverage-finland-92144
03/07/2024, 4:43 PM.Values.userSettings.bucketName
adventurous-napkin-36518
03/07/2024, 4:44 PMadventurous-napkin-36518
03/07/2024, 4:53 PMforbidden, Reason: "IAM", UserMessage: "Unable to generate access token; IAM returned 403 Forbidden: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).\nThis error could be caused by a missing IAM policy binding on the target IAM service account.\nFor more information, refer to the Workload Identity documentation:\n\t<https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to>\n", started at 2024-03-07 16:51:08.984515318 +0000 UTC m=+183187.278580063
adventurous-napkin-36518
03/07/2024, 4:54 PM[conn-id:22c33a1d8d8ce5a9 ip:172.16.0.61 pod:flytesnacks-development/fef289c19c0fb4703b69-n0-0-driver rpc-id:3396cac83aeffbf5] "/computeMetadata/v1/instance/service-accounts/flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control" HTTP/403: generic::permission_denied: loading: GenerateAccessToken("flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com", ""): googleapi: Error 403: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).
average-finland-92144
03/07/2024, 4:55 PMadventurous-napkin-36518
03/07/2024, 4:56 PMaverage-finland-92144
03/07/2024, 4:57 PMadventurous-napkin-36518
03/07/2024, 4:58 PMadventurous-napkin-36518
03/07/2024, 4:59 PMaverage-finland-92144
03/07/2024, 5:18 PMspark
Service Account?average-finland-92144
03/07/2024, 5:19 PMpyflyte run --remote <your-workflow> --service-account=spark
?adventurous-napkin-36518
03/07/2024, 5:20 PMpyflyte run --remote pyspark_pi.py my_spark
adventurous-napkin-36518
03/07/2024, 5:20 PMaverage-finland-92144
03/07/2024, 5:20 PMadventurous-napkin-36518
03/07/2024, 5:21 PMaverage-finland-92144
03/07/2024, 5:23 PM--remote
like
pyflyte run --remote --service-account=spark ...
adventurous-napkin-36518
03/07/2024, 5:27 PMaverage-finland-92144
03/07/2024, 5:32 PMspark
SA has the IAM role annotation?
kubectl describe sa spark -n <spark-operator-namespace>
adventurous-napkin-36518
03/07/2024, 5:36 PMaverage-finland-92144
03/07/2024, 5:37 PMflyte
on deploy-flyte
so
kubectl describe sa spark -n flyte
adventurous-napkin-36518
03/07/2024, 5:38 PMspark-operator
namespace where as flyte is in flyte
namespace.adventurous-napkin-36518
03/07/2024, 5:40 PMError from server (NotFound): serviceaccounts "spark" not found
average-finland-92144
03/07/2024, 5:42 PMkubectl get sa -n spark-operator
adventurous-napkin-36518
03/07/2024, 5:43 PMadventurous-napkin-36518
03/07/2024, 5:46 PMvalues-core-gcp.yaml
looks like this https://gist.github.com/jegadesh-google/be2a44026525cb41a868462cb1cf384b and i used the below commands to setup the spark operator
helm repo add spark-operator <https://googlecloudplatform.github.io/spark-on-k8s-operator>
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace
average-finland-92144
03/07/2024, 6:07 PM<http://iam.tf|iam.tf>
module performs.
One last check, isn't there a spark
SA on the flyte
ns?
kubect get sa -n flyte
adventurous-napkin-36518
03/08/2024, 1:59 AMspark
sa.average-finland-92144
03/08/2024, 3:38 PMadventurous-napkin-36518
03/08/2024, 11:21 PMadventurous-napkin-36518
03/11/2024, 8:36 AMaverage-finland-92144
03/11/2024, 10:32 PMadventurous-napkin-36518
03/11/2024, 11:25 PMadventurous-napkin-36518
03/12/2024, 3:19 PMiam.serviceAccounts.getAccessToken
but even after doing that, i am facing the same issue..average-finland-92144
03/12/2024, 9:14 PM<http://flyte.tf|flyte.tf>
to reference the new values file (optional just in case you want to keep this config on a different file)
3. Run terraform apply and verify that a spark
SA is created on the flytesnacks-development
namespace, verify it's annotated with the GSA and verify there's a spark-role
role created in the same namespace.
4. Added to <http://iam.tf|iam.tf>
the "spark"
SA to this array:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L2
3. Added the `iam.serviceAccounts.signBlob`permission to this role:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L80
4. Save and terraform apply
5. I had to complete the steps to use Artifact Registry and specify my repo name (like registry=<my-repo>
) here
AS you may have noticed I removed the ResourceQuota bc the scheduler will fail admission if there are not either requests or limits and, that's another effort: to profile what are the recommended base resources.
With this, I don't have any permission issues as of now. I'm getting an annoying No Module found
type of error but that's probably on the user side.
Please let me know if this works for you as an update to the docs is neededadventurous-napkin-36518
03/13/2024, 12:13 PM