hello hello - i am trying to setup workload identi...
# ask-the-community
c
hello hello - i am trying to setup workload identity in aks. i followed this guide with a few adjustements and i got the flyte pod deployed and talking to azure storage account; it creates the container, flyte and metadata blob, etc. i setup a kubernets ServiceAccount (
workload-identity-sa
) for this purpose and associated it with an AZ id with the necessary permissions. I then setup a second kubernetes sa (
workload-identity-development-sa
) to run workflows/tasks. when i try to run
pyflyte --verbose run --service-account workload-identity-development-sa --project flyte-az --domain development --remote ./workflows/simple-workflow.py simple_workflow
i get the following error:
Copy code
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "failed to create workflow in propeller flyteworkflows.flyte.lyft.com is forbidden: User "system:serviceaccount:flyte-az:workload-identity-sa" cannot create resource "flyteworkflows" in API group "flyte.lyft.com" in the namespace "flyte-az-development": Azure does not have opinion for this user."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8089 {grpc_message:"failed to create workflow in propeller flyteworkflows.flyte.lyft.com is forbidden: User \"system:serviceaccount:flyte-az:workload-identity-sa\" cannot create resource \"flyteworkflows\" in API group \"flyte.lyft.com\" in the namespace \"flyte-az-development\": 
Azure does not have opinion for this user.", grpc_status:13, created_time:"2023-10-05T15:52:41.780222-06:00"}"
note that the SA referenced is the one associated with the primary flyte deployment, not the one i setup to use with tasks
i attempted to associate the
task
SA with the task pods by creating a
PodTemplate
, which is referenced in my values.yaml:
Copy code
apiVersion: v1
kind: PodTemplate
metadata:
  name: service-account-template
  namespace: flyte-az-development
template:
  metadata:
    labels:
      azure.workload.identity/use: "true"
  spec:
    containers:
    - name: default
      image: {private-acr}
    serviceAccountName: workload-identity-development-sa
i am either missing a configuration or misunderstanding it entirely (not mutually exclusive)
any ideas?
i also tagged my tasks explicitly with the
PodTemplate
, hoping that would force them to adopt the referenced SA:
Copy code
@task(pod_template_name="service-account-template")
def dataframe_to_csv(df: pd.DataFrame) -> str:
    csv_buffer = StringIO()
    df.to_csv(csv_buffer)
    csv_buffer.flush()
    return csv_buffer.getvalue()
d
Hi @Chris Grass ! So, are the Task pods using the Service Account you created? Are you using
flyte-binary
or
flyte-core
?
c
hi David - the task pods appear to fail before being created. i don't see one if i run
kubectl get pods -n flyte-az-development
i am using
flyte-binary
this is the full verbose output:
Copy code
pyflyte --verbose run --service-account workload-identity-development-sa --project flyte-az --domain development --remote ./workflows/simple-workflow.py simple_workflow
Verbose mode on
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_interceptor.py:274 in continuation                                                                                                                                                                                                                                                             │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱ 274 │   │   │   │   response, call = self._thunk(new_method).with_call(                                                                                                                                                                                                                                                                                           │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_interceptor.py:301 in with_call                                                                                                                                                                                                                                                                │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱ 301 │   │   return self._with_call(request,                                                                                                                                                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_interceptor.py:290 in _with_call                                                                                                                                                                                                                                                               │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱ 290 │   │   return call.result(), call                                                                                                                                                                                                                                                                                                                            │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_channel.py:379 in result                                                                                                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱  379 │   │   raise self                                                                                                                                                                                                                                                                                                                                           │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_interceptor.py:274 in continuation                                                                                                                                                                                                                                                             │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱ 274 │   │   │   │   response, call = self._thunk(new_method).with_call(                                                                                                                                                                                                                                                                                           │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_channel.py:1043 in with_call                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱ 1043 │   │   return _end_unary_response_blocking(state, call, True, None)                                                                                                                                                                                                                                                                                         │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ /Users/chris.grass/Library/Python/3.9/lib/python/site-packages/grpc/_channel.py:910 in _end_unary_response_blocking                                                                                                                                                                                                                                                 │
│                                                                                                                                                                                                                                                                                                                                                                     │
│ ❱  910 │   │   raise _InactiveRpcError(state)  # pytype: disable=not-instantiable                                                                                                                                                                                                                                                                                   │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
d
what about the flyte-binary Pod? is it using the
ServiceAccountName: workload-identity-sa
?
c
yes - flyte-backend-flyte-binary-db6f776b5-7phcb is using
workload-identity-sa
in the
flyte-az
namespace
d
And I guess you created the SA manually and left
serviceAccount.create
as `false`in the values file?
c
correct
d
I think the issue here is that, when that happens, no
ClusterRoleBinding
is created (see https://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/templates/clusterrolebinding.yaml) so this SA is kinda isolated. The error message mentions an action and API group that is not allowed
let me see what I can find in the templates
c
ahhhh - i read that the
clusterrole
is ignored if
rbac.create
is false and thought i read the same regarding the
ClusterRoleBinding
. but you are absolutely correct, that uses
{{- if and .Values.rbac.create .Values.serviceAccount.create }}
d
if you add the
azure.workload...
annotation to
commonAnnotations
in your values file, it will be added to the service account ( see https://github.com/flyteorg/flyte/blob/4ee73f583d39fb878d1c487b3e92c61e7abab329/charts/flyte-binary/templates/serviceaccount.yaml#L15C6-L15C6)
c
ok, i'll walk down that path and see where it leads. thanks for the quick response!
d
cool, let us know if it works. I think there's a growing need to have better resources for Flyte deployments on Azure
c
cool - i work with Terence, who has a PR up to add Azure AD support to stow. and i'll put together a small PR for flyte with a couple small changes to get it up and running with azure configs
for now i hacked the clusterrolebinding yaml to only look at rbac.create - that seems to create the necessary role and unblock my testing. thanks again!