flyte on GKE 1.25 Does someone run flyte already o...
# ask-the-community
d
flyte on GKE 1.25 Does someone run flyte already on a GKE 1.25 version while using workload identity? We noticed that it breaks fast registration of workflows as signed URLs can't be created:
Copy code
...
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/flytekit/remote/remote.py", line 580, in fast_package
    return self._upload_file(pathlib.Path(zip_file))
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/flytekit/remote/remote.py", line 598, in _upload_file
    upload_location = self.client.get_upload_signed_url(
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/flytekit/clients/raw.py", line 856, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/pyenv-root/versions/3.9.12/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "failed to create a signed url. Error: unable to sign bytes: googleapi: Error 403: Request had insufficient authentication scopes.
Details:
[
  {
    "@type": "<http://type.googleapis.com/google.rpc.ErrorInfo|type.googleapis.com/google.rpc.ErrorInfo>",
    "domain": "<http://googleapis.com|googleapis.com>",
    "metadata": {
      "method": "google.iam.credentials.v1.IAMCredentials.SignBlob",
      "service": "<http://iamcredentials.googleapis.com|iamcredentials.googleapis.com>"
    },
    "reason": "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
  }
]"
        debug_error_string = "UNKNOWN:Error received from peer ipv4:{removed} {created_time:"2023-02-20T11:01:31.243469439+00:00", grpc_status:13, grpc_message:"failed to create a signed url. Error: unable to sign bytes: googleapi: Error 403: Request had insufficient authentication scopes.\nDetails:\n[\n  {\n    \"@type\": \"<http://type.googleapis.com/google.rpc.ErrorInfo\|type.googleapis.com/google.rpc.ErrorInfo\>",\n    \"domain\": \"<http://googleapis.com|googleapis.com>\",\n    \"metadata\": {\n      \"method\": \"google.iam.credentials.v1.IAMCredentials.SignBlob\",\n      \"service\": \"<http://iamcredentials.googleapis.com|iamcredentials.googleapis.com>\"\n    },\n    \"reason\": \"ACCESS_TOKEN_SCOPE_INSUFFICIENT\"\n  }\n]"}"
We noticed this issue only appears on our clusters running GKE version
1.25.5
, it does not appear on the clusters still running
1.24.9
s
@Yee, can you help?
j
In my case it was due to insufficient permissions (but the error message specifically had stated:
Permission 'iam.serviceAccounts.signBlob' denied on resource
s
Hey @Fabio Grätz! We need your help here. 🙂
f
Dennis is my colleague ^^ We haven’t figured this one out yet unfortunately, it is reproducible though, we recreated the cluster from scratch to rule out that the node pool GKE version upgrade process broke something.
d
We think we found the issue now. Has there been a change in the
<http://cr.flyte.org/flyteorg/flyteadmin-release:v1.3.0|cr.flyte.org/flyteorg/flyteadmin-release:v1.3.0>
image? We discovered that forcing an image pull solved the issue all of a sudden. I only noticed after I could not reproduce the issue with my own debug builds of
flyteadmin
.
s
cc @Yee @Eduardo Apolinario (eapolinario)
d
We found the underlying issue, stow seems to require a different token scope in GKE 1.25.5. We will prepare a PR for it.
y
😞
thank you guys!
this will be easier for us to vet once we get our internal gke cluster up and running
f
Nothing about this was contained in the GKE release notes unfortunately ahah, good job google. Let us know in case you want input/IaC when creating a Flyte deployment based on GKE.
y
thank you @Fabio Grätz!
d
@Yee Please find the PR here
157 Views