able-ice-99180
07/28/2022, 2:11 AMfreezing-airport-6809
icy-agent-73298
07/28/2022, 5:00 AMadmin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///port-forwarded-uri
authType: Pkce
insecure: true
freezing-boots-56761
icy-agent-73298
07/28/2022, 5:04 AMfreezing-boots-56761
apiVersion: apps/v1
kind: Deployment
metadata:
name: flyte-proxy
labels:
app: flyte-proxy
spec:
selector:
matchLabels:
app: flyte-proxy
template:
metadata:
labels:
app: flyte-proxy
spec:
containers:
- name: proxy
image: envoyproxy/envoy:v1.21.1
args:
- envoy
- -c /etc/envoy/config.yaml
ports:
- name: http
containerPort: 8000
volumeMounts:
- name: config-volume
mountPath: /etc/envoy
volumes:
- name: config-volume
configMap:
name: flyte-proxy-config
freezing-boots-56761
freezing-boots-56761
kubectl port-forward deploy/flyte-proxy 8000
freezing-boots-56761
freezing-boots-56761
freezing-boots-56761
icy-agent-73298
07/28/2022, 5:19 AMable-ice-99180
07/28/2022, 10:37 AMyes that should be possible if you just port-forward flyteadmin and use that endpoint in the pyflyte config.yamlI tried something similar (I believe) and got some RPC errors. First I did:
kubectl -n flyte port-forward service/flyteadmin 30081:81
Then I set my .flyte/config.yaml
to:
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:30081
authType: Pkce
insecure: true
logger:
show-source: true
level: 0
Then I ran:
$ pyflyte run --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
The error I got is in this snippet.
debug_error_string = "{"created":"@1658972857.863942000","description":"Error received from peer ipv6:[::1]:30081","file":"src/core/lib/surface/call.cc","file_line":904,"grpc_message":"failed to create a signed url. Error: unable to sign bytes: googleapi: Error 403: The caller does not have permission","grpc_status":2}"
I'm brand new here, so its very possible I missed a setup step somewhere.able-ice-99180
07/28/2022, 10:39 AMwill break forwarding to flyteconsole probably in case he's trying to monitor. our local sandbox uses a envoy proxy for this exact purpose.1. I'm still learning. Can you describe why it breaks forwarding to flyteconsole? Naively, I did try to additionally forward flyteconsole along with the above using:
kubectl -n flyte port-forward service/flyteconsole 30080:80
I saw the console when I navigated to localhost:30080/console
, but there was an error displayed. I'm curious why. Thank you.
2. Silly question. But once the everything is deployed via Opta, how do you apply that envoy k8s config and layer in the envoy config.yaml?icy-agent-73298
07/28/2022, 10:40 AMable-ice-99180
07/28/2022, 10:49 AMable-ice-99180
07/28/2022, 11:55 AMopta apply -c flyte.yaml
. Tried re-running and no resources changed.
The output says:
adminflyteaccount_service_account_email = "gsa-flyteadmin@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
adminflyteaccount_service_account_id = "gsa-flyteadmin"
bucket_id = "<NAME>-service-flyte"
bucket_name = "<NAME>-service-flyte"
datacatalogaccount_service_account_email = "gsa-datacatalog@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
datacatalogaccount_service_account_id = "gsa-datacatalog"
flytedevelopmentaccount_service_account_email = "gsa-development@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytedevelopmentaccount_service_account_id = "gsa-development"
flyteproductionaccount_service_account_email = "gsa-production@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flyteproductionaccount_service_account_id = "gsa-production"
flytepropelleraccount_service_account_email = "gsa-flytepropeller@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytepropelleraccount_service_account_id = "gsa-flytepropeller"
flytescheduleraccount_service_account_email = "gsa-flytescheduler@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytescheduleraccount_service_account_id = "gsa-flytescheduler"
flytestagingaccount_service_account_email = "gsa-staging@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>"
flytestagingaccount_service_account_id = "gsa-staging"
However I don't see any of the gsa-*
service accounts in my project IAM settings. I only see one new one:
opta-<NAME>-ep63@<GCP_PROJECT>.<http://iam.gserviceaccount.com|iam.gserviceaccount.com>
🤔icy-agent-73298
07/28/2022, 11:58 AMicy-agent-73298
07/28/2022, 11:59 AMable-ice-99180
07/28/2022, 11:59 AMable-ice-99180
07/28/2022, 12:00 PMable-ice-99180
07/28/2022, 12:01 PMWarnings:
- Applied changes may be incomplete
To see the full warning notes, run Terraform without -compact-warnings.
That may be related. Is there a way to re-run the Opta (or terraform directly) to give full warnings?icy-agent-73298
07/28/2022, 12:02 PMable-ice-99180
07/28/2022, 12:02 PMicy-agent-73298
07/28/2022, 12:02 PMable-ice-99180
07/28/2022, 12:04 PMicy-agent-73298
07/28/2022, 12:06 PMable-ice-99180
07/28/2022, 1:41 PMgsa-*
service accounts in my GCP IAM page like before.
The output is using just opta apply -c flyte.yaml
. If there is a way to get more verbose logs, I can re-runable-ice-99180
07/28/2022, 1:47 PM--detailed-plan
option. I'll destroy and re-apply flyte.yaml with that option to print more out.able-ice-99180
07/28/2022, 1:54 PMicy-agent-73298
07/28/2022, 2:02 PMable-ice-99180
07/28/2022, 2:32 PMable-ice-99180
07/28/2022, 2:37 PMiam.serviceAccounts.signBlob
permission is provided to the flyteadmin service in the Opta configuration.icy-agent-73298
07/28/2022, 2:44 PMserviceAccount:gsa-flyteadmin@urbn-data-science.iam.gserviceaccount.com
as a memberable-ice-99180
07/28/2022, 2:46 PM$ pyflyte run --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
Go to <http://localhost:30081/console/projects/flytesnacks/domains/development/executions/f75075fdeda774b358b4> to see execution in the console.
able-ice-99180
07/28/2022, 2:47 PMlocalhost:30081
doesn't have the console UI for reasons discussed above (still curious why port forwarding 30080 doesn't work. that's for another time. 😉 )
But I can access the logs using flightctlable-ice-99180
07/28/2022, 2:47 PM$ flytectl get execution -p flytesnacks -d development
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
| NAME | LAUNCH PLAN NAME | VERSION | TYPE | PHASE | SCHEDULED TIME | STARTED | ELAPSED TIME | ABORT DATA (TRUNC) | ERROR DATA (TRUNC) |
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
| f75075fdeda774b358b4 | core.flyte_basics.basic_workflow.my_wf | 5UFDB8TsvDDDvRjqYjRC5w== | LAUNCH_PLAN | FAILED | | 2022-07-28T14:42:51.395613125Z | 34.973618876s | | |1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
| | | | | | | | | | [f750 |
---------------------- ---------------------------------------- -------------------------- ------------- -------- ---------------- -------------------------------- --------------- -------------------- -----------------------------------------------------------------------------------------------
1 rows
able-ice-99180
07/28/2022, 2:48 PMstorage.objects.get
permissions issue now.icy-agent-73298
07/28/2022, 2:49 PMable-ice-99180
07/28/2022, 2:49 PMicy-agent-73298
07/28/2022, 2:56 PMable-ice-99180
07/28/2022, 2:56 PMroles/iam.workloadIdentityUser
in my project anywhereable-ice-99180
07/28/2022, 2:58 PMicy-agent-73298
07/28/2022, 3:01 PMicy-agent-73298
07/28/2022, 3:02 PMserviceAccount:gsa-development@urbn-data-science.iam.gserviceaccount.com
able-ice-99180
07/28/2022, 3:06 PMable-ice-99180
07/28/2022, 3:07 PMk -n development describe pods f75075fdeda774b358b4-n0-0
and got this output. But didn't see a service account there.icy-agent-73298
07/28/2022, 3:07 PMable-ice-99180
07/28/2022, 3:09 PMdefault
service account on the k8s cluster and assumed I should have something different.able-ice-99180
07/28/2022, 3:09 PMserviceAccount: default
serviceAccountName: default
able-ice-99180
07/28/2022, 3:10 PM$ k -n development get sa default -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>
creationTimestamp: "2022-07-28T13:38:02Z"
name: default
namespace: development
resourceVersion: "31160"
uid: 0bda0bec-49e4-4b32-991d-7fd706315c77
secrets:
- name: default-token-zfswb
icy-agent-73298
07/28/2022, 3:11 PM<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>
This particular gsa needs to have those storage rolesable-ice-99180
07/28/2022, 3:11 PMable-ice-99180
07/28/2022, 3:20 PMable-ice-99180
07/28/2022, 3:20 PMfreezing-airport-6809
able-ice-99180
07/28/2022, 3:23 PMicy-agent-73298
07/28/2022, 3:30 PMicy-agent-73298
07/28/2022, 4:53 PMable-ice-99180
07/28/2022, 7:52 PM$ kubectl exec -it pod/workload-identity-test --namespace test-wi -- /bin/bash
root@workload-identity-test:/# curl -H "Metadata-Flavor: Google" <http://169.254.169.254/computeMetadata/v1/instance/service-accounts/>
default/
<mailto:test-wi-gsa@urbn-data-science.iam.gserviceaccount.com|test-wi-gsa@urbn-data-science.iam.gserviceaccount.com>/
2. 🟢 Confirmed I can get read the bucket, with a test pod, inside the development
flyte namespace
Using this test spec, I was able to read the Flyte bucket in the container by executing: gsutil ls <gs://flyte-ts-temp-service-flyte>
in the pod:
Spec, flight-test.yaml:
apiVersion: v1
kind: Pod
metadata:
name: flyte-manual-test
namespace: development
spec:
containers:
# - image: <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
- image: google/cloud-sdk:slim
name: flyte-manual-test
command: ["sleep", "infinity"]
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
Output:
root@workload-identity-test:/# gsutil ls <gs://flyte-ts-temp-service-flyte>
<gs://flyte-ts-temp-service-flyte/metadata/>
<gs://flyte-ts-temp-service-flyte/t2/>
This is using <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
as the GCP mapped SA.
3. 🔴 Unable to run pyflyte, or even same spec, with flytekit image.
pyflyte runs still result in permissions errors like above. Another interesting note: if I swap the image in the above spec from google/cloud-sdk:slim
to <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
, I get the same permission errors.
Output when using ``ghcr.io/flyteorg/flytekit:py3.8-1.0.3` in flyte-test.yaml:
root@flight-test-ts-temp:~# gsutil ls <gs://flyte-ts-temp-service-flyte>
ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
(same error)
I confirmed the service account in that container is the same as above by hitting the endpoint shown in #1 (from the tutorial, but using python requests since curl isn't in that flytekit image):
root@flyte-manual-test:~# python3
>>> import requests
>>> r = requests.get("<http://169.254.169.254/computeMetadata/v1/instance/service-accounts/>", headers={"Metadata-Flavor": "Google"})
>>> print(r.content.decode())
default/
<mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
It also uses <mailto:gsa-development@urbn-data-science.iam.gserviceaccount.com|gsa-development@urbn-data-science.iam.gserviceaccount.com>/
I also pulled the spec from the pyflyte execution and attempted a manual gsutil
on the running pod (entering with a sleep) and got the same error.able-ice-99180
07/28/2022, 8:00 PMboundless-pizza-95864
07/28/2022, 9:21 PMgsutil
not being able to authenticate without additional config (compared to google-cloud-sdk installed gsutil). I ran into this a while ago and we actually had a thread here I can’t find anymore probably due to the Slack history limit.boundless-pizza-95864
07/28/2022, 9:23 PMboundless-pizza-95864
07/28/2022, 9:24 PMboundless-pizza-95864
07/28/2022, 9:29 PMRUN curl <https://storage.googleapis.com/pub/gsutil.tar.gz> | tar xfz - -C /opt && ln -s /opt/gsutil/gsutil /bin/gsutil
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg # Required for gsutil to work with workload-identity
I guess it should also work with standalone gsutil installed via pip so perhaps try to derive an image from <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
with the second line added to check if that makes any difference.able-ice-99180
07/28/2022, 9:37 PMfreezing-airport-6809
flytekitplugins-data-fsspec
and then install GCS for fsspec?freezing-airport-6809
boundless-pizza-95864
07/28/2022, 9:50 PMboundless-pizza-95864
07/28/2022, 9:53 PMable-ice-99180
07/28/2022, 11:59 PMFROM <http://ghcr.io/flyteorg/flytekit:py3.8-1.0.3|ghcr.io/flyteorg/flytekit:py3.8-1.0.3>
# Required for gsutil to work with workload-identity
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
Pushed it as image name:
<http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>
Then ran pyflyte:
pyflyte run --image <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest> --remote core/flyte_basics/basic_workflow.py my_wf --a 5 --b hello
And got a successful run on the GKE cluster:
$ flytectl get execution ffe50f73fa4564737bf6 -p flytesnacks -d development
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
| NAME | LAUNCH PLAN NAME | VERSION | TYPE | PHASE | SCHEDULED TIME | STARTED | ELAPSED TIME | ABORT DATA (TRUNC) | ERROR DATA (TRUNC) |
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
| ffe50f73fa4564737bf6 | core.flyte_basics.basic_workflow.my_wf | 00jRSrIIdnwryVi5J7STWw== | LAUNCH_PLAN | SUCCEEDED | | 2022-07-28T23:52:37.007821370Z | 78.295864832s | | |
---------------------- ---------------------------------------- -------------------------- ------------- ----------- ---------------- -------------------------------- --------------- -------------------- --------------------
Thank you everyone in this thread for the fantastic support! I think with this tweak, that concludes the investigation for the original goal, i.e. getting flyte to run on GCP without a domain. The only open item on my end is to try out that envoy config @freezing-boots-56761 provided in order to see the GUI (perhaps tomorrow).able-ice-99180
07/29/2022, 12:03 AM'[GoogleCompute]\nservice_account = default'
is required in /etc/boto.cfg
for the flytekit docker image to work. Otherwise the pod dies with a bucket permission error. -- though I am guessing this since this is cloud-specific it should be handled somewhere else outside the Dockerfile maybe?
Do you all suggest I report these as GitHub issues? Or are they known already and a report not required?freezing-airport-6809
freezing-airport-6809
able-ice-99180
07/29/2022, 12:09 AMfreezing-boots-56761
able-ice-99180
07/29/2022, 11:28 AMable-ice-99180
07/29/2022, 11:46 AMkubectl -n flyte port-forward deployment/flyte-proxy 30080:8000
I then set my ~/.flyte/config.yaml
to:
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:30080
...
And when I run I get the error:
"Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNIMPLEMENTED
...
(full trace attached)
I can still get it to work if I separately port-forward flyteadmin with:
kubectl -n flyte port-forward service/flyteadmin 30081:81
and set config to 30081
.
When you use this, are you able to access flyteadmin and the console with one port-forward?freezing-boots-56761
freezing-boots-56761
freezing-boots-56761
freezing-boots-56761
able-ice-99180
07/29/2022, 1:21 PMfreezing-boots-56761
freezing-boots-56761
freezing-boots-56761