Hello, we've recently doing a bigger Flyte update ...
# flyte-deployment
s
Hello, we've recently doing a bigger Flyte update in our AWS deployments (jumping from 1.9 to 1.12). While that worked flawlessly for shortlived clusters, we've been running in some confusing behavior with our 24/7 clusters. The base setup is that we have a control plane in one AWS account and multiple data planes in several other accounts. This implies that we also have dataplane IAM roles per AWS account that should be individually applied to the serviceaccounts of certain namespaces the data plane is operated for. What I currently struggle to understand or find the right resources for, is why, after some time, the control planes serviceaccount-namespace-mapping seems to be overwriting the ones on the dataplanes and therefore setting the IAM roles inside the default service accounts per namespace to IAM roles from the control plane. This causes the scheduled workflows to assume the wrong role and run into the following error:
Copy code
An error occurred (InvalidIdentityToken) when 
calling the AssumeRoleWithWebIdentity operation: No OpenIDConnect provider found
in your account for 
<https://oidc.eks.eu-central-1.amazonaws.com/id/><id>
I get that the job of
syncresources
is to do that, but how would I overwrite/adjust that? Should e.g. this part just be left out on the control plane?
šŸ‘€ 1
a
Hey @some-grass-84903 I think this is the second time I've seen a report of a similar `syncresources`behavior on a multi-cluster environment. Would you mind filing an Issue please? For now I think even if you remove that section, the syncresources will sync that empty config to the dataplanes. Disabling the cluster_resource_manager I think it's what you'd need to do for now?
s
Oh I thought I was just not reading the docs properly yet šŸ˜… For sure I can report it! For now we rolled back, but once we go into a more detailed update, I happily provide more details. Gonna double check on the
cluster_resource_manager
as well! Thanks for the input!
šŸ™‡šŸ½ 1
We where able to solve it. As mentioned above we now removed the
defaultIamRole
annotation, as we set the appropriate service account annotation during the data plane deployment. To reproduce the issue: • Set whatever value for
defaultIamRole
in the
cluster_resources.yaml
part of the
flyte-admin-base-config
configmap of the control plane • Restart the
flyteadmin
deployment in the control plane • Wait and check the default service account in the respected namespace of the data plane (its annotation with eks role arn should change to your set value after a few minutes) I guess it works as intended. Although I think it would be nice if the annotations are not overwritten if they are not matching the defaults.