Jegadesh Thirumeni
03/07/2024, 10:06 AMDavid Espejo (he/him)
03/07/2024, 3:53 PMdeploy-flyte
then it's flyte-core
. You should then add the config specified in the docs for flyte-core to your values-gcp-core.yaml
file and upgrade your Helm deployment (just running terraform apply
)Jegadesh Thirumeni
03/07/2024, 4:01 PMDavid Espejo (he/him)
03/07/2024, 4:02 PMJegadesh Thirumeni
03/07/2024, 4:04 PMDavid Espejo (he/him)
03/07/2024, 4:06 PMspark-config-default
has some keys that are AWS-specificJegadesh Thirumeni
03/07/2024, 4:06 PMDavid Espejo (he/him)
03/07/2024, 4:26 PMspark-config-default:
- spark.eventLog.enabled: "true"
- spark.eventLog.dir: "{{ Values.userSettings.bucketName }}/spark-events"
- spark.driver.cores: "1"
- spark.executorEnv.HTTP2_DISABLE: "true"
- spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
- spark.kubernetes.allocation.batch.size: "50"
- spark.kubernetes.driverEnv.HTTP2_DISABLE: "true"
- spark.network.timeout: 600s
- spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
- spark.executor.heartbeatInterval: 60s
Jegadesh Thirumeni
03/07/2024, 4:41 PMJegadesh Thirumeni
03/07/2024, 4:41 PM│ Error: template: flyte-core/templates/propeller/webhook.yaml:33:27: executing "flyte-core/templates/propeller/webhook.yaml" at <include (print .Template.BasePath "/propeller/configmap.yaml") .>: error calling include: template: flyte-core/templates/propeller/configmap.yaml:41:19: executing "flyte-core/templates/propeller/configmap.yaml" at <tpl (toYaml .) $>: error calling tpl: error during tpl function execution for "plugins:\n spark:\n spark-config-default:\n - spark.eventLog.enabled: \"true\"\n - spark.eventLog.dir: '{{ Values.userSettings.bucketName }}/spark-events'\n - spark.driver.cores: \"1\"\n - spark.executorEnv.HTTP2_DISABLE: \"true\"\n - spark.hadoop.fs.AbstractFileSystem.gs.impl: com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS\n - spark.kubernetes.allocation.batch.size: \"50\"\n - spark.kubernetes.driverEnv.HTTP2_DISABLE: \"true\"\n - spark.network.timeout: 600s\n - spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000\n - spark.executor.heartbeatInterval: 60s": parse error at (flyte-core/templates/propeller/webhook.yaml:5): function "Values" not defined
│
Jegadesh Thirumeni
03/07/2024, 4:41 PMDavid Espejo (he/him)
03/07/2024, 4:43 PM.Values.userSettings.bucketName
Jegadesh Thirumeni
03/07/2024, 4:44 PMJegadesh Thirumeni
03/07/2024, 4:53 PMforbidden, Reason: "IAM", UserMessage: "Unable to generate access token; IAM returned 403 Forbidden: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).\nThis error could be caused by a missing IAM policy binding on the target IAM service account.\nFor more information, refer to the Workload Identity documentation:\n\t<https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#authenticating_to>\n", started at 2024-03-07 16:51:08.984515318 +0000 UTC m=+183187.278580063
Jegadesh Thirumeni
03/07/2024, 4:54 PM[conn-id:22c33a1d8d8ce5a9 ip:172.16.0.61 pod:flytesnacks-development/fef289c19c0fb4703b69-n0-0-driver rpc-id:3396cac83aeffbf5] "/computeMetadata/v1/instance/service-accounts/flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control" HTTP/403: generic::permission_denied: loading: GenerateAccessToken("flyte-gcp-flyteworkers@<projectid>.iam.gserviceaccount.com", ""): googleapi: Error 403: Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).
David Espejo (he/him)
03/07/2024, 4:55 PMJegadesh Thirumeni
03/07/2024, 4:56 PMDavid Espejo (he/him)
03/07/2024, 4:57 PMJegadesh Thirumeni
03/07/2024, 4:58 PMJegadesh Thirumeni
03/07/2024, 4:59 PMDavid Espejo (he/him)
03/07/2024, 5:18 PMspark
Service Account?David Espejo (he/him)
03/07/2024, 5:19 PMpyflyte run --remote <your-workflow> --service-account=spark
?Jegadesh Thirumeni
03/07/2024, 5:20 PMpyflyte run --remote pyspark_pi.py my_spark
Jegadesh Thirumeni
03/07/2024, 5:20 PMDavid Espejo (he/him)
03/07/2024, 5:20 PMJegadesh Thirumeni
03/07/2024, 5:21 PMDavid Espejo (he/him)
03/07/2024, 5:23 PM--remote
like
pyflyte run --remote --service-account=spark ...
Jegadesh Thirumeni
03/07/2024, 5:27 PMDavid Espejo (he/him)
03/07/2024, 5:32 PMspark
SA has the IAM role annotation?
kubectl describe sa spark -n <spark-operator-namespace>
Jegadesh Thirumeni
03/07/2024, 5:36 PMDavid Espejo (he/him)
03/07/2024, 5:37 PMflyte
on deploy-flyte
so
kubectl describe sa spark -n flyte
Jegadesh Thirumeni
03/07/2024, 5:38 PMspark-operator
namespace where as flyte is in flyte
namespace.Jegadesh Thirumeni
03/07/2024, 5:40 PMError from server (NotFound): serviceaccounts "spark" not found
David Espejo (he/him)
03/07/2024, 5:42 PMkubectl get sa -n spark-operator
Jegadesh Thirumeni
03/07/2024, 5:43 PMJegadesh Thirumeni
03/07/2024, 5:46 PMvalues-core-gcp.yaml
looks like this https://gist.github.com/jegadesh-google/be2a44026525cb41a868462cb1cf384b and i used the below commands to setup the spark operator
helm repo add spark-operator <https://googlecloudplatform.github.io/spark-on-k8s-operator>
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace
David Espejo (he/him)
03/07/2024, 6:07 PM<http://iam.tf|iam.tf>
module performs.
One last check, isn't there a spark
SA on the flyte
ns?
kubect get sa -n flyte
Jegadesh Thirumeni
03/08/2024, 1:59 AMspark
sa.David Espejo (he/him)
03/08/2024, 3:38 PMJegadesh Thirumeni
03/08/2024, 11:21 PMJegadesh Thirumeni
03/11/2024, 8:36 AMDavid Espejo (he/him)
03/11/2024, 10:32 PMJegadesh Thirumeni
03/11/2024, 11:25 PMJegadesh Thirumeni
03/12/2024, 3:19 PMiam.serviceAccounts.getAccessToken
but even after doing that, i am facing the same issue..David Espejo (he/him)
03/12/2024, 9:14 PM<http://flyte.tf|flyte.tf>
to reference the new values file (optional just in case you want to keep this config on a different file)
3. Run terraform apply and verify that a spark
SA is created on the flytesnacks-development
namespace, verify it's annotated with the GSA and verify there's a spark-role
role created in the same namespace.
4. Added to <http://iam.tf|iam.tf>
the "spark"
SA to this array:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L2
3. Added the `iam.serviceAccounts.signBlob`permission to this role:
https://github.com/unionai-oss/deploy-flyte/blob/6a6765cd4cb92fad46bb4b6466edf8f5a766bbb4/environments/gcp/flyte-core/iam.tf#L80
4. Save and terraform apply
5. I had to complete the steps to use Artifact Registry and specify my repo name (like registry=<my-repo>
) here
AS you may have noticed I removed the ResourceQuota bc the scheduler will fail admission if there are not either requests or limits and, that's another effort: to profile what are the recommended base resources.
With this, I don't have any permission issues as of now. I'm getting an annoying No Module found
type of error but that's probably on the user side.
Please let me know if this works for you as an update to the docs is neededJegadesh Thirumeni
03/13/2024, 12:13 PM