Tom Szumowski
07/28/2022, 2:11 AMKetan (kumare3)
Prafulla Mahindrakar
07/28/2022, 5:00 AMadmin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///port-forwarded-uri
authType: Pkce
insecure: true
jeev
Prafulla Mahindrakar
07/28/2022, 5:04 AMjeev
apiVersion: apps/v1
kind: Deployment
metadata:
name: flyte-proxy
labels:
app: flyte-proxy
spec:
selector:
matchLabels:
app: flyte-proxy
template:
metadata:
labels:
app: flyte-proxy
spec:
containers:
- name: proxy
image: envoyproxy/envoy:v1.21.1
args:
- envoy
- -c /etc/envoy/config.yaml
ports:
- name: http
containerPort: 8000
volumeMounts:
- name: config-volume
mountPath: /etc/envoy
volumes:
- name: config-volume
configMap:
name: flyte-proxy-config
kubectl port-forward deploy/flyte-proxy 8000
Prafulla Mahindrakar
07/28/2022, 5:19 AMTom Szumowski
07/28/2022, 10:37 AMyes that should be possible if you just port-forward flyteadmin and use that endpoint in the pyflyte config.yamlI tried something similar (I believe) and got some RPC errors. First I did:
kubectl -n flyte port-forward service/flyteadmin 30081:81
Then I set my .flyte/config.yaml
to:
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:30081
authType: Pkce
insecure: true
logger:
show-source: true
level: 0
Then I ran:
$ pyflyte run --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
The error I got is in this snippet.
debug_error_string = "{"created":"@1658972857.863942000","description":"Error received from peer ipv6:[::1]:30081","file":"src/core/lib/surface/call.cc","file_line":904,"grpc_message":"failed to create a signed url. Error: unable to sign bytes: googleapi: Error 403: The caller does not have permission","grpc_status":2}"
I'm brand new here, so its very possible I missed a setup step somewhere.will break forwarding to flyteconsole probably in case he's trying to monitor. our local sandbox uses a envoy proxy for this exact purpose.1. I'm still learning. Can you describe why it breaks forwarding to flyteconsole? Naively, I did try to additionally forward flyteconsole along with the above using:
kubectl -n flyte port-forward service/flyteconsole 30080:80
I saw the console when I navigated to localhost:30080/console
, but there was an error displayed. I'm curious why. Thank you.
2. Silly question. But once the everything is deployed via Opta, how do you apply that envoy k8s config and layer in the envoy config.yaml?Prafulla Mahindrakar
07/28/2022, 10:40 AMTom Szumowski
07/28/2022, 10:49 AMopta apply -c flyte.yaml
. Tried re-running and no resources changed.
The output says:
adminflyteaccount_service_account_email = "gsa-flyteadmin@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
adminflyteaccount_service_account_id = "gsa-flyteadmin"
bucket_id = "<NAME>-service-flyte"
bucket_name = "<NAME>-service-flyte"
datacatalogaccount_service_account_email = "gsa-datacatalog@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
datacatalogaccount_service_account_id = "gsa-datacatalog"
flytedevelopmentaccount_service_account_email = "gsa-development@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytedevelopmentaccount_service_account_id = "gsa-development"
flyteproductionaccount_service_account_email = "gsa-production@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flyteproductionaccount_service_account_id = "gsa-production"
flytepropelleraccount_service_account_email = "gsa-flytepropeller@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytepropelleraccount_service_account_id = "gsa-flytepropeller"
flytescheduleraccount_service_account_email = "gsa-flytescheduler@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytescheduleraccount_service_account_id = "gsa-flytescheduler"
flytestagingaccount_service_account_email = "gsa-staging@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytestagingaccount_service_account_id = "gsa-staging"
However I don't see any of the gsa-*
service accounts in my project IAM settings. I only see one new one:
opta-<NAME>-ep63@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
🤔Prafulla Mahindrakar
07/28/2022, 11:58 AMTom Szumowski
07/28/2022, 11:59 AMWarnings:
- Applied changes may be incomplete
To see the full warning notes, run Terraform without -compact-warnings.
That may be related. Is there a way to re-run the Opta (or terraform directly) to give full warnings?Prafulla Mahindrakar
07/28/2022, 12:02 PMTom Szumowski
07/28/2022, 12:02 PMPrafulla Mahindrakar
07/28/2022, 12:02 PMTom Szumowski
07/28/2022, 12:04 PMPrafulla Mahindrakar
07/28/2022, 12:06 PMTom Szumowski
07/28/2022, 1:41 PMgsa-*
service accounts in my GCP IAM page like before.
The output is using just opta apply -c flyte.yaml
. If there is a way to get more verbose logs, I can re-run--detailed-plan
option. I'll destroy and re-apply flyte.yaml with that option to print more out.Prafulla Mahindrakar
07/28/2022, 2:02 PMTom Szumowski
07/28/2022, 2:32 PMiam.serviceAccounts.signBlob
permission is provided to the flyteadmin service in the Opta configuration.Prafulla Mahindrakar
07/28/2022, 2:44 PMserviceAccount:gsa-flyteadmin@urbn-data-science.iam.gserviceaccount.com
as a memberTom Szumowski
07/28/2022, 2:46 PM$ pyflyte run --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
Go to <http://localhost:30081/console/projects/flytesnacks/domains/development/executions/f75075fdeda774b358b4> to see execution in the console.
localhost:30081
doesn't have the console UI for reasons discussed above (still curious why port forwarding 30080 doesn't work. that's for another time. 😉 )
But I can access the logs using flightctl$ flytectl get execution -p flytesnacks -d development
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
| NAME | LAUNCH PLAN NAME | VERSION | TYPE | PHASE | SCHEDULED TIME | STARTED | ELAPSED TIME | ABORT DATA (TRUNC) | ERROR DATA (TRUNC) |
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
| f75075fdeda774b358b4 | core.flyte_basics.basic_workflow.my_wf | 5UFDB8TsvDDDvRjqYjRC5w== | LAUNCH_PLAN | FAILED | | 2022-07-28T14:42:51.395613125Z | 34.973618876s | | |1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
| | | | | | | | | | [f750 |
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
1 rows
storage.objects.get
permissions issue now.Prafulla Mahindrakar
07/28/2022, 2:49 PMTom Szumowski
07/28/2022, 2:49 PMPrafulla Mahindrakar
07/28/2022, 2:56 PMTom Szumowski
07/28/2022, 2:56 PMroles/iam.workloadIdentityUser
in my project anywherePrafulla Mahindrakar
07/28/2022, 3:01 PMserviceAccount:gsa-development@urbn-data-science.iam.gserviceaccount.com
Tom Szumowski
07/28/2022, 3:06 PMk -n development describe pods f75075fdeda774b358b4-n0-0
and got this output. But didn't see a service account there.Prafulla Mahindrakar
07/28/2022, 3:07 PMTom Szumowski
07/28/2022, 3:09 PMdefault
service account on the k8s cluster and assumed I should have something different.serviceAccount: default
serviceAccountName: default
$ k -n development get sa default -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>
creationTimestamp: "2022-07-28T13:38:02Z"
name: default
namespace: development
resourceVersion: "31160"
uid: 0bda0bec-49e4-4b32-991d-7fd706315c77
secrets:
- name: default-token-zfswb
Prafulla Mahindrakar
07/28/2022, 3:11 PM<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>
This particular gsa needs to have those storage rolesTom Szumowski
07/28/2022, 3:11 PMKetan (kumare3)
Tom Szumowski
07/28/2022, 3:23 PMPrafulla Mahindrakar
07/28/2022, 3:30 PMTom Szumowski
07/28/2022, 7:52 PM$ kubectl exec -it pod/workload-identity-test --namespace test-wi -- /bin/bash
root@workload-identity-test:/# curl -H "Metadata-Flavor: Google" <http://169.254.169.254/computeMetadata/v1/instance/service-accounts/>
default/
<mailto:test-wi-gsa@urbn-data-science.iam.gserviceaccount.com|test-wi-gsa@urbn-data-science.iam.gserviceaccount.com>/
2. 🟢 Confirmed I can get read the bucket, with a test pod, inside the development
flyte namespace
Using this test spec, I was able to read the Flyte bucket in the container by executing: gsutil ls <gs://flyte-ts-temp-service-flyte>
in the pod:
Spec, flight-test.yaml:
apiVersion: v1
kind: Pod
metadata:
name: flyte-manual-test
namespace: development
spec:
containers:
# - image: <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
- image: google/cloud-sdk:slim
name: flyte-manual-test
command: ["sleep", "infinity"]
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
Output:
root@workload-identity-test:/# gsutil ls <gs://flyte-ts-temp-service-flyte>
<gs://flyte-ts-temp-service-flyte/metadata/>
<gs://flyte-ts-temp-service-flyte/t2/>
This is using <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
as the GCP mapped SA.
3. 🔴 Unable to run pyflyte, or even same spec, with flytekit image.
pyflyte runs still result in permissions errors like above. Another interesting note: if I swap the image in the above spec from google/cloud-sdk:slim
to <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
, I get the same permission errors.
Output when using ``ghcr.io/flyteorg/flytekit:py3.8-1.0.3` in flyte-test.yaml:
root@flight-test-ts-temp:~# gsutil ls <gs://flyte-ts-temp-service-flyte>
ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
(same error)
I confirmed the service account in that container is the same as above by hitting the endpoint shown in #1 (from the tutorial, but using python requests since curl isn't in that flytekit image):
root@flyte-manual-test:~# python3
>>> import requests
>>> r = requests.get("<http://169.254.169.254/computeMetadata/v1/instance/service-accounts/>", headers={"Metadata-Flavor": "Google"})
>>> print(r.content.decode())
default/
<mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
It also uses <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
I also pulled the spec from the pyflyte execution and attempted a manual gsutil
on the running pod (entering with a sleep) and got the same error.Sören Brunk
07/28/2022, 9:21 PMgsutil
not being able to authenticate without additional config (compared to google-cloud-sdk installed gsutil). I ran into this a while ago and we actually had a thread here I can’t find anymore probably due to the Slack history limit.RUN curl <https://storage.googleapis.com/pub/gsutil.tar.gz> | tar xfz - -C /opt && ln -s /opt/gsutil/gsutil /bin/gsutil
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg # Required for gsutil to work with workload-identity
I guess it should also work with standalone gsutil installed via pip so perhaps try to derive an image from <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
with the second line added to check if that makes any difference.Tom Szumowski
07/28/2022, 9:37 PMKetan (kumare3)
flytekitplugins-data-fsspec
and then install GCS for fsspec?Sören Brunk
07/28/2022, 9:50 PMTom Szumowski
07/28/2022, 11:59 PMFROM <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
# Required for gsutil to work with workload-identity
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
Pushed it as image name:
<http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>
Then ran pyflyte:
pyflyte run --image <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest> --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
And got a successful run on the GKE cluster:
$ flytectl get execution ffe50f73fa4564737bf6 -p flytesnacks -d development
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
| NAME | LAUNCH PLAN NAME | VERSION | TYPE | PHASE | SCHEDULED TIME | STARTED | ELAPSED TIME | ABORT DATA (TRUNC) | ERROR DATA (TRUNC) |
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
| ffe50f73fa4564737bf6 | core.flyte_basics.basic_workflow.my_wf | 00jRSrIIdnwryVi5J7STWw== | LAUNCH_PLAN | SUCCEEDED | | 2022-07-28T23:52:37.007821370Z | 78.295864832s | | |
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
Thank you everyone in this thread for the fantastic support! I think with this tweak, that concludes the investigation for the original goal, i.e. getting flyte to run on GCP without a domain. The only open item on my end is to try out that envoy config @jeev provided in order to see the GUI (perhaps tomorrow).'[GoogleCompute]\nservice_account = default'
is required in /etc/boto.cfg
for the flytekit docker image to work. Otherwise the pod dies with a bucket permission error. -- though I am guessing this since this is cloud-specific it should be handled somewhere else outside the Dockerfile maybe?
Do you all suggest I report these as GitHub issues? Or are they known already and a report not required?Ketan (kumare3)
Tom Szumowski
07/29/2022, 12:09 AMjeev
Tom Szumowski
07/29/2022, 11:28 AMkubectl -n flyte port-forward deployment/flyte-proxy 30080:8000
I then set my ~/.flyte/config.yaml
to:
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:30080
...
And when I run I get the error:
"Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNIMPLEMENTED
...
(full trace attached)
I can still get it to work if I separately port-forward flyteadmin with:
kubectl -n flyte port-forward service/flyteadmin 30081:81
and set config to 30081
.
When you use this, are you able to access flyteadmin and the console with one port-forward?jeev
Tom Szumowski
07/29/2022, 1:21 PMjeev