gorgeous-caravan-46442
01/28/2025, 7:50 PMgorgeous-caravan-46442
01/28/2025, 7:52 PMkubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' --namespace testdb --image <http://docker.io/bitnami/postgresql:11.7.0-debian-10-r9|docker.io/bitnami/postgresql:11.7.0-debian-10-r9> --env='PGPASSWORD=<Password>' --command -- psql testdb --host <RDS-ENDPOINT-NAME> -U flyteadmin -d flyteadmin -p 5432
that I can connect to the endpoint from the clustergorgeous-caravan-46442
01/28/2025, 8:02 PM03-roles-service-accounts.md
was a little tricky.
This was the closest I could get to:
this.cluster.addManifest('flyte-namespace', {
apiVersion: 'v1',
kind: 'Namespace',
metadata: {
name: 'flyte',
},
});
const flytePolicy = new Policy(this, 'FlyteCustomPolicy', {
statements: [
new PolicyStatement({
effect: Effect.ALLOW,
actions: [
's3:DeleteObject*',
's3:GetObject*',
's3:ListBucket',
's3:PutObject*'
],
resources: [
`arn:aws:s3:::XXXXXXXX`,
`arn:aws:s3:::XXXXXXXX/*`
]
})
]
});
const flyteBackendServiceAccountManifest = cluster.addServiceAccount('flyte-system-role', {
name: 'flyte-backend-flyte-binary',
namespace: 'flyte',
});
flyteBackendServiceAccountManifest.role.attachInlinePolicy(flytePolicy);
const flyteWorkersServiceAccountManifest = cluster.addServiceAccount('flyte-workers-role', {
name: 'flyte-admin', // should be default, but get the error
namespace: 'flyte',
});
flyteWorkersServiceAccountManifest.role.attachInlinePolicy(flytePolicy);
// MANUAL EDIT OF TRUST POLICY
Two things for the flyte workers role:
• I couldn't set the name: 'default'
. It said the name already exists
• I am not able to alter the trust policy through the CDK. I think this is due to limitations of CDK and the .addServiceAccount
method. I manually changed this in the console (from system:serviceaccount:flyte:flyte-admin
) to
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::YYYYYYYY:oidc-provider/oidc.eks.<region>.<http://amazonaws.com/id/XXXXX|amazonaws.com/id/XXXXX>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.eks.<region>.<http://amazonaws.com/id/XXXXX:sub|amazonaws.com/id/XXXXX:sub>": "system:serviceaccount:*:default",
"oidc.eks.<region>.<http://amazonaws.com/id/XXXXX:aud|amazonaws.com/id/XXXXX:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
}
}
}
]
}
And then deployed. The 'flyte'
namespace exists, as do the service accounts kubectl get serviceaccounts --namespace flyte
gorgeous-caravan-46442
01/28/2025, 8:04 PM05-deploy-with-helm.md
. Ideally I would use the CDK method cluster.addHelmChart
, but there's a lot of yaml so I'll do a manual deployment for now. I edit the helm file and I get the error
Error: INSTALLATION FAILED: Unable to continue with install: ServiceAccount "flyte-backend-flyte-binary" in namespace "flyte" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>": must be set to "Helm"; annotation validation error: missing key "<http://meta.helm.sh/release-name|meta.helm.sh/release-name>": must be set to "flyte-backend"; annotation validation error: missing key "<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>": must be set to "flyte"
gorgeous-caravan-46442
01/28/2025, 8:05 PMgorgeous-caravan-46442
01/28/2025, 8:45 PMserviceAccount:
create: false
The pods fail to create. the log output is
1.373ms] [rows:0] SELECT description FROM pg_catalog.pg_description WHERE objsubid = (SELECT ordinal_position FROM information_schema.columns WHERE table_schema = CURRENT_SCHEMA() AND table_name = 'reservations' AND column_name = 'serialized_metadata') AND objoid = (SELECT oid FROM pg_catalog.pg_class WHERE relname = 'reservations' AND relnamespace = (SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = CURRENT_SCHEMA()))
{"json":{"src":"initialize.go:74"},"level":"info","msg":"Ran DB migration successfully.","ts":"2025-01-28T20:42:39Z"}
{"json":{"app_name":"datacatalog","src":"service.go:98"},"level":"info","msg":"Created data storage.","ts":"2025-01-28T20:42:39Z"}
{"json":{"app_name":"datacatalog","src":"service.go:109"},"level":"info","msg":"Created DB connection.","ts":"2025-01-28T20:42:39Z"}
{"json":{"src":"service.go:129"},"level":"info","msg":"Serving DataCatalog Insecure on port :8081","ts":"2025-01-28T20:42:39Z"}
{"json":{"src":"init_cert.go:63"},"level":"info","msg":"Creating secret [flyte-backend-flyte-binary-webhook-secret] in Namespace [flyte]","ts":"2025-01-28T20:42:46Z"}
{"json":{"src":"start.go:152"},"level":"error","msg":"Failed to initialize certificates for Secrets Webhook. client rate limiter Wait returned an error: context canceled","ts":"2025-01-28T20:42:46Z"}
{"json":{"src":"start.go:228"},"level":"panic","msg":"Failed to start Propeller, err: failed to create FlyteWorkflow CRD: <http://customresourcedefinitions.apiextensions.k8s.io|customresourcedefinitions.apiextensions.k8s.io> is forbidden: User \"system:serviceaccount:flyte:default\" cannot create resource \"customresourcedefinitions\" in API group \"<http://apiextensions.k8s.io|apiextensions.k8s.io>\" at the cluster scope","ts":"2025-01-28T20:42:46Z"}
panic: (*logrus.Entry) 0xc000856620
Seems to be an error linking to the service account, but unsure where to go from hereaverage-finland-92144
01/29/2025, 4:00 AMflyte-backend-flyte-binary
SAgorgeous-caravan-46442
01/29/2025, 6:32 AMname: 'default'
on the flyte-workers-role
. Does it matter if there are other service accounts linked to the cluster? I also have these service accounts from when I set up the cluster before trying to configure flyte.
const serviceAccountManifest = cluster.addServiceAccount('eks-admin-service-account', {
name: 'eks-admin',
namespace: 'kube-system',
});
const clusterRoleBindingManifest = cluster.addManifest('eks-admin-cluster-role-binding', {
apiVersion: '<http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>', // native Kubernetes Role Based Access Control (RBAC)
kind: 'ClusterRoleBinding',
metadata: {
name: 'eks-admin',
},
roleRef: {
apiGroup: '<http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>',
kind: 'ClusterRole',
name: 'cluster-admin',
},
subjects: [
{
kind: 'ServiceAccount',
name: 'eks-admin',
namespace: 'kube-system',
},
],
});
Just for some more debugging, this is the output of kubectl describe sa flyte-backend-flyte-binary --namepsace flyte
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/name=flyte-backend-flyte-binary|app.kubernetes.io/name=flyte-backend-flyte-binary>
aws.cdk.eks/prune-XXXXXXXXX=
Annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::XXXXXXXX:role/YYYYYYYY-EKSClusterflytesystemrole-ZZZZZZZ
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
seems like it isn't exactly like that in the instructions. Missing labels
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary> <http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.3.0|helm.sh/chart=flyte-binary-v1.3.0>
annotations
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
average-finland-92144
01/29/2025, 2:46 PMkubectl get clusterrolebinding -n flyte
and then describe
the rolebinding thereaverage-finland-92144
01/29/2025, 2:47 PMdefault
service account to instantiate the CRD.
Please also do a describe
on the flyte-binary Pod just to confirm which SA is it usinggorgeous-caravan-46442
01/29/2025, 9:16 PMkubectl get clusterrolebinding -n flyte
NAME ROLE AGE
aws-node ClusterRole/aws-node 5d23h
cluster-admin ClusterRole/cluster-admin 5d23h
eks-admin ClusterRole/cluster-admin 5d23h
eks:addon-cluster-admin ClusterRole/cluster-admin 5d23h
eks:addon-manager ClusterRole/eks:addon-manager 5d23h
eks:az-poller ClusterRole/eks:az-poller 5d23h
eks:certificate-controller ClusterRole/system:controller:certificate-controller 5d23h
eks:certificate-controller-approver ClusterRole/eks:certificate-controller-approver 5d23h
eks:certificate-controller-manager ClusterRole/eks:certificate-controller-manager 5d23h
eks:certificate-controller-signer ClusterRole/eks:certificate-controller-signer 5d23h
eks:cloud-controller-manager ClusterRole/eks:cloud-controller-manager 5d23h
eks:cloud-provider-extraction-migration ClusterRole/eks:cloud-provider-extraction-migration 5d23h
eks:cluster-event-watcher ClusterRole/eks:cluster-event-watcher 5d23h
eks:coredns-autoscaler ClusterRole/eks:coredns-autoscaler 5d23h
eks:extension-metrics-apiserver ClusterRole/eks:extension-metrics-apiserver 5d23h
eks:extension-metrics-apiserver-auth-delegator ClusterRole/system:auth-delegator 5d23h
eks:fargate-manager ClusterRole/eks:fargate-manager 5d23h
eks:fargate-scheduler ClusterRole/eks:fargate-scheduler 5d23h
eks:k8s-metrics ClusterRole/eks:k8s-metrics 5d23h
eks:kms-storage-migrator ClusterRole/eks:kms-storage-migrator 5d23h
eks:kube-proxy ClusterRole/system:node-proxier 5d23h
eks:kube-proxy-fargate ClusterRole/system:node-proxier 5d23h
eks:kube-proxy-windows ClusterRole/system:node-proxier 5d23h
eks:network-policy-controller ClusterRole/eks:network-policy-controller 5d23h
eks:network-webhooks ClusterRole/eks:network-webhooks 5d23h
eks:node-bootstrapper ClusterRole/eks:node-bootstrapper 5d23h
eks:node-manager ClusterRole/eks:node-manager 5d23h
eks:nodewatcher ClusterRole/eks:nodewatcher 5d23h
eks:pod-identity-mutating-webhook ClusterRole/eks:pod-identity-mutating-webhook 5d23h
eks:service-operations ClusterRole/eks:service-operations 5d23h
eks:tagging-controller ClusterRole/eks:tagging-controller 5d23h
flyte-backend-flyte-binary-cluster-role-binding ClusterRole/flyte-backend-flyte-binary-cluster-role 47h
kuberay-apiserver ClusterRole/kuberay-apiserver 5d23h
kuberay-operator ClusterRole/kuberay-operator 5d23h
system:basic-user ClusterRole/system:basic-user 5d23h
system:controller:attachdetach-controller ClusterRole/system:controller:attachdetach-controller 5d23h
system:controller:certificate-controller ClusterRole/system:controller:certificate-controller 5d23h
system:controller:clusterrole-aggregation-controller ClusterRole/system:controller:clusterrole-aggregation-controller 5d23h
system:controller:cronjob-controller ClusterRole/system:controller:cronjob-controller 5d23h
system:controller:daemon-set-controller ClusterRole/system:controller:daemon-set-controller 5d23h
system:controller:deployment-controller ClusterRole/system:controller:deployment-controller 5d23h
system:controller:disruption-controller ClusterRole/system:controller:disruption-controller 5d23h
system:controller:endpoint-controller ClusterRole/system:controller:endpoint-controller 5d23h
system:controller:endpointslice-controller ClusterRole/system:controller:endpointslice-controller 5d23h
system:controller:endpointslicemirroring-controller ClusterRole/system:controller:endpointslicemirroring-controller 5d23h
system:controller:ephemeral-volume-controller ClusterRole/system:controller:ephemeral-volume-controller 5d23h
system:controller:expand-controller ClusterRole/system:controller:expand-controller 5d23h
system:controller:generic-garbage-collector ClusterRole/system:controller:generic-garbage-collector 5d23h
system:controller:horizontal-pod-autoscaler ClusterRole/system:controller:horizontal-pod-autoscaler 5d23h
system:controller:job-controller ClusterRole/system:controller:job-controller 5d23h
system:controller:legacy-service-account-token-cleaner ClusterRole/system:controller:legacy-service-account-token-cleaner 5d23h
system:controller:namespace-controller ClusterRole/system:controller:namespace-controller 5d23h
system:controller:node-controller ClusterRole/system:controller:node-controller 5d23h
system:controller:persistent-volume-binder ClusterRole/system:controller:persistent-volume-binder 5d23h
system:controller:pod-garbage-collector ClusterRole/system:controller:pod-garbage-collector 5d23h
system:controller:pv-protection-controller ClusterRole/system:controller:pv-protection-controller 5d23h
system:controller:pvc-protection-controller ClusterRole/system:controller:pvc-protection-controller 5d23h
system:controller:replicaset-controller ClusterRole/system:controller:replicaset-controller 5d23h
system:controller:replication-controller ClusterRole/system:controller:replication-controller 5d23h
system:controller:resourcequota-controller ClusterRole/system:controller:resourcequota-controller 5d23h
system:controller:root-ca-cert-publisher ClusterRole/system:controller:root-ca-cert-publisher 5d23h
system:controller:route-controller ClusterRole/system:controller:route-controller 5d23h
system:controller:service-account-controller ClusterRole/system:controller:service-account-controller 5d23h
system:controller:service-controller ClusterRole/system:controller:service-controller 5d23h
system:controller:statefulset-controller ClusterRole/system:controller:statefulset-controller 5d23h
system:controller:ttl-after-finished-controller ClusterRole/system:controller:ttl-after-finished-controller 5d23h
system:controller:ttl-controller ClusterRole/system:controller:ttl-controller 5d23h
system:controller:validatingadmissionpolicy-status-controller ClusterRole/system:controller:validatingadmissionpolicy-status-controller 5d23h
system:coredns ClusterRole/system:coredns 5d23h
system:discovery ClusterRole/system:discovery 5d23h
system:kube-controller-manager ClusterRole/system:kube-controller-manager 5d23h
system:kube-dns ClusterRole/system:kube-dns 5d23h
system:kube-scheduler ClusterRole/system:kube-scheduler 5d23h
system:monitoring ClusterRole/system:monitoring 5d23h
system:node ClusterRole/system:node 5d23h
system:node-proxier ClusterRole/system:node-proxier 5d23h
system:public-info-viewer ClusterRole/system:public-info-viewer 5d23h
system:service-account-issuer-discovery ClusterRole/system:service-account-issuer-discovery 5d23h
system:volume-scheduler ClusterRole/system:volume-scheduler 5d23h
vpc-resource-controller-rolebinding ClusterRole/vpc-resource-controller-role 5d23h
gorgeous-caravan-46442
01/29/2025, 9:16 PMkubectl describe clusterrolebinding flyte-backend-flyte-binary-cluster-role
kubectl describe clusterrolebinding flyte-backend-flyte-binary-cluster-role
Name: flyte-backend-flyte-binary-cluster-role-binding
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.14.1|helm.sh/chart=flyte-binary-v1.14.1>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Role:
Kind: ClusterRole
Name: flyte-backend-flyte-binary-cluster-role
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount flyte-backend-flyte-binary flyte
gorgeous-caravan-46442
01/29/2025, 9:22 PMkubectl describe pod flyte-backend-flyte-binary-6f99bcdbb8-v29ml -n flyte
Name: flyte-backend-flyte-binary-6f99bcdbb8-v29ml
Namespace: flyte
Priority: 0
Service Account: default
Node: XXXXXX.ec2.internal/<IP>
Start Time: Tue, 28 Jan 2025 21:33:07 +0000
Labels: <http://app.kubernetes.io/component=flyte-binary|app.kubernetes.io/component=flyte-binary>
<http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
pod-template-hash=XXXXX
Annotations: checksum/cluster-resource-templates: XXXXX
checksum/configuration: XXXX
checksum/configuration-secret: XXXXX
Status: Running
IP: <IP>
IPs:
IP: <IP>
Controlled By: ReplicaSet/flyte-backend-flyte-binary-6f99bcdbb8
Init Containers:
wait-for-db:
Container ID: <containerd://XXXXXXXX>
Image: postgres:15-alpine
Image ID: <http://docker.io/library/postgres@sha256:XXXXXXX|docker.io/library/postgres@sha256:XXXXXXX>
Port: <none>
Host Port: <none>
Command:
sh
-ec
Args:
until pg_isready \
-h flyteadmin.cluster-XXXXXX.<region>.<http://rds.amazonaws.com|rds.amazonaws.com> \
-p 5432 \
-U flyteadmin
do
echo waiting for database
sleep 0.1
done
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 28 Jan 2025 21:33:08 +0000
Finished: Tue, 28 Jan 2025 21:33:08 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fxz8z (ro)
Containers:
flyte:
Container ID: <containerd://XXXXXXX>
Image: <http://cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1|cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1>
Image ID: <http://cr.flyte.org/flyteorg/flyte-binary-release@sha256:XXXXXXX|cr.flyte.org/flyteorg/flyte-binary-release@sha256:XXXXXXX>
Ports: 8088/TCP, 8089/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
start
--config
/etc/flyte/config.d/*.yaml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 29 Jan 2025 21:11:14 +0000
Finished: Wed, 29 Jan 2025 21:11:19 +0000
Ready: False
Restart Count: 278
Liveness: http-get http://:http/healthcheck delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/healthcheck delay=30s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: flyte-backend-flyte-binary-6f99bcdbb8-v29ml (v1:metadata.name)
POD_NAMESPACE: flyte (v1:metadata.namespace)
Mounts:
/etc/flyte/cluster-resource-templates from cluster-resource-templates (rw)
/etc/flyte/config.d from config (rw)
/var/run/flyte from state (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fxz8z (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cluster-resource-templates:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: flyte-backend-flyte-binary-cluster-resource-templates
ConfigMapOptional: <nil>
config:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: flyte-backend-flyte-binary-config
ConfigMapOptional: <nil>
SecretName: flyte-backend-flyte-binary-config-secret
SecretOptionalName: <nil>
state:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-fxz8z:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 90s (x6618 over 23h) kubelet Back-off restarting failed container flyte in pod flyte-backend-flyte-binary-6f99bcdbb8-v29ml_flyte(1269290f-51b1-420a-8d5b-6326c72d25d5)
average-finland-92144
01/29/2025, 9:58 PMdefault
SA but the flyte-backend-flyte-binary
gorgeous-caravan-46442
01/29/2025, 10:07 PM002_serviceaccount.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default # <---- to flyte-backend-flyte-binary
namespace: '{{ namespace }}'
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: '{{ defaultIamRole }}'
average-finland-92144
01/29/2025, 10:07 PMgorgeous-caravan-46442
01/29/2025, 10:20 PMaverage-finland-92144
01/29/2025, 11:21 PMflyte-backend-flyte-binary
but let Helm do it? I think it's the way the rest of the Helm templates can plumb into, for example, the binary Pod (see https://github.com/flyteorg/flyte/blob/448aba97201ba42297282d859e6064b7f89537ae/charts/flyte-binary/templates/deployment.yaml#L62)gorgeous-caravan-46442
01/29/2025, 11:38 PMcluster.addServiceAccount()
I.e. in your docs, you say this
"You won't create a Kubernetes service account at this point; it will be created by running the Helm chart at the end of the process"
but CDK will have already created the service account.
Looking at the eks-starter.yaml
, we have
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: <flyte-system-role>
maybe I can just manually create an role and use that arn and see if helm does the rest?gorgeous-caravan-46442
01/30/2025, 1:09 AMserviceAccount:
create: true
Helm created the service account:
kubectl get sa --namespace flyte
NAME SECRETS AGE
default 0 45h
flyte-admin 0 45h
flyte-backend-flyte-binary 0 26m
kubectl describe sa --namespace flyte
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.14.1|helm.sh/chart=flyte-binary-v1.14.1>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
The pod is running and didn't crash. It is now using the right service account too
kubectl describe pod --namespace flyte flyte-backend-flyte-binary-8589d74cf6-5cf2s
Name: flyte-backend-flyte-binary-8589d74cf6-5cf2s
Namespace: flyte
Priority: 0
Service Account: flyte-backend-flyte-binary
Node: <>
Start Time: Thu, 30 Jan 2025 00:32:21 +0000
Labels: <http://app.kubernetes.io/component=flyte-binary|app.kubernetes.io/component=flyte-binary>
<http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
pod-template-hash=8589d74cf6
Annotations: checksum/cluster-resource-templates: XXXXX
checksum/configuration: XXXX
checksum/configuration-secret: XXX
Status: Running
IP: <>
IPs:
IP: <>
Controlled By: ReplicaSet/flyte-backend-flyte-binary-8589d74cf6
Init Containers:
wait-for-db:
Container ID: <containerd://XXXXX>
Image: postgres:15-alpine
Image ID: <http://docker.io/library/postgres@sha256:XXX|docker.io/library/postgres@sha256:XXX>
Port: <none>
Host Port: <none>
Command:
sh
-ec
Args:
until pg_isready \
-h <http://flyteadmin.cluster-XXX.XXXX.rds.amazonaws.com|flyteadmin.cluster-XXX.XXXX.rds.amazonaws.com> \
-p 5432 \
-U flyteadmin
do
echo waiting for database
sleep 0.1
done
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Jan 2025 00:32:22 +0000
Finished: Thu, 30 Jan 2025 00:32:22 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q5gjc (ro)
Containers:
flyte:
Container ID: <containerd://XXXX>
Image: <http://cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1|cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1>
Image ID: <http://cr.flyte.org/flyteorg/flyte-binary-release@sha256:XXXXX|cr.flyte.org/flyteorg/flyte-binary-release@sha256:XXXXX>
Ports: 8088/TCP, 8089/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
start
--config
/etc/flyte/config.d/*.yaml
State: Running
Started: Thu, 30 Jan 2025 00:32:23 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/healthcheck delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/healthcheck delay=30s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: flyte-backend-flyte-binary-8589d74cf6-5cf2s (v1:metadata.name)
POD_NAMESPACE: flyte (v1:metadata.namespace)
Mounts:
/etc/flyte/cluster-resource-templates from cluster-resource-templates (rw)
/etc/flyte/config.d from config (rw)
/var/run/flyte from state (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q5gjc (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
cluster-resource-templates:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: flyte-backend-flyte-binary-cluster-resource-templates
ConfigMapOptional: <nil>
config:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: flyte-backend-flyte-binary-config
ConfigMapOptional: <nil>
SecretName: flyte-backend-flyte-binary-config-secret
SecretOptionalName: <nil>
state:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-q5gjc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 33m default-scheduler Successfully assigned flyte/flyte-backend-flyte-binary-8589d74cf6-5cf2s to ip-XXXX.ec2.internal
Normal Pulled 33m kubelet Container image "postgres:15-alpine" already present on machine
Normal Created 33m kubelet Created container wait-for-db
Normal Started 33m kubelet Started container wait-for-db
Normal Pulled 33m kubelet Container image "<http://cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1|cr.flyte.org/flyteorg/flyte-binary-release:v1.14.1>" already present on machine
Normal Created 33m kubelet Created container flyte
Normal Started 33m kubelet Started container flyte
gorgeous-caravan-46442
01/30/2025, 1:12 AMkubectl -n flyte port-forward service/flyte-backend-flyte-binary 8088:8088 8089:8089
I get Error from server (NotFound): services "flyte-backend-flyte-binary" not found
The only services are flyte-backend-flyte-binary-grpc
, flyte-backend-flyte-binary-http
, and flyte-backend-flyte-binary-webhook
, and trying to chage the command to port-forward one of those services does not workgorgeous-caravan-46442
01/30/2025, 1:14 AM2025/01/30 01:13:54 /flyteorg/build/flyteadmin/scheduler/repositories/gormimpl/schedulable_entity_repo.go:70
[2.313ms] [rows:0] SELECT * FROM "schedulable_entities"
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2025-01-30T01:13:55Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2025-01-30T01:13:55Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2025-01-30T01:13:55Z"}
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2025-01-30T01:13:56Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2025-01-30T01:13:56Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2025-01-30T01:13:56Z"}
average-finland-92144
01/30/2025, 1:15 AMaverage-finland-92144
01/30/2025, 1:16 AMkubectl -n flyte port-forward service/flyte-backend-flyte-binary-grpc 8089:8089
gorgeous-caravan-46442
01/30/2025, 1:18 AMaverage-finland-92144
01/30/2025, 1:19 AMkubectl -n flyte port-forward service/flyte-backend-flyte-binary-http 8088:8088
and go to localhost:8088/console
gorgeous-caravan-46442
01/30/2025, 1:21 AMaverage-finland-92144
01/30/2025, 1:22 AMserviceAccount:
create: enable
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::<aws-account-id>:role/flyte-system-role"
average-finland-92144
01/30/2025, 1:22 AMgorgeous-caravan-46442
01/30/2025, 1:24 AMaverage-finland-92144
01/30/2025, 1:29 AMgorgeous-caravan-46442
01/30/2025, 1:59 AMflyctl
and I've cloned the example flytesnacks repo. pyflyte run basics/hello_world.py hello_world_wf
runs fine locally, but with --remote
I get
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:30080: Failed to connect to remote host: connect: Connection
refused (111)"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2025-01-30T01:56:05.966009007+00:00", grpc_status:14, grpc_message:"failed
to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:30080: Failed to connect to remote host: connect: Connection refused (111)"}"
>
average-finland-92144
01/30/2025, 5:36 AMflytectl demo start
option before? looks like your pyflyte client is pointing to that instanceaverage-finland-92144
01/30/2025, 5:36 AMexport FLYTECTL_CONFIG=$HOME/.flyte/config.yaml
average-finland-92144
01/30/2025, 5:37 AMadmin:
endpoint: localhost:8089
gorgeous-caravan-46442
01/30/2025, 5:30 PMflyctl
rather than flytectl
... confusing both use the same colour of pink/purple)average-finland-92144
01/30/2025, 5:37 PMaverage-finland-92144
01/30/2025, 5:37 PMgorgeous-caravan-46442
01/30/2025, 5:38 PMflytectl demo start
ran but I'm going to configure to try the remote clustergorgeous-caravan-46442
01/30/2025, 5:44 PMflytcectl demo start
seems to have deleted the original helm deployment pods
kubectl get services -n flyte
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flyte-sandbox-docker-registry NodePort 10.43.149.169 <none> 5000:30000/TCP 10m
flyte-sandbox-grpc ClusterIP 10.43.42.250 <none> 8089/TCP 10m
flyte-sandbox-http ClusterIP 10.43.213.177 <none> 8088/TCP 10m
flyte-sandbox-kubernetes-dashboard ClusterIP 10.43.52.239 <none> 80/TCP 10m
flyte-sandbox-minio NodePort 10.43.206.188 <none> 9000:30002/TCP,9001:31243/TCP 10m
flyte-sandbox-postgresql NodePort 10.43.78.35 <none> 5432:30001/TCP 10m
flyte-sandbox-postgresql-hl ClusterIP None <none> 5432/TCP 10m
flyte-sandbox-proxy NodePort 10.43.78.59 <none> 8000:30080/TCP 10m
flyte-sandbox-webhook ClusterIP 10.43.229.159 <none> 443/TCP 10m
flyteagent ClusterIP 10.43.43.218 <none> 8000/TCP 10m
average-finland-92144
01/30/2025, 5:45 PMflytectl demo start
at allgorgeous-caravan-46442
01/30/2025, 5:46 PMflytectl demo start
command before. I don't know why it was pointing to that instance. I'll delete those sandbox pods and redeploy with helm nowaverage-finland-92144
01/30/2025, 5:48 PMgorgeous-caravan-46442
01/30/2025, 6:05 PMEKS cluster on your AWS account rightyeah exactly. kubectl is currently pointing to the wrong cluster, so I think I just have to change that back to the AWS EKS cluster
gorgeous-caravan-46442
01/30/2025, 6:06 PMkubectl config current-context
is pointing the the EKS clustergorgeous-caravan-46442
01/30/2025, 6:07 PMgorgeous-caravan-46442
01/30/2025, 6:08 PMgorgeous-caravan-46442
01/30/2025, 6:09 PMgorgeous-caravan-46442
01/30/2025, 6:09 PMgorgeous-caravan-46442
01/30/2025, 6:12 PMaverage-finland-92144
01/30/2025, 6:16 PMit says IAM role default and service account defaulthm, we can confirm if it's a UI behavior or not. Do a
kubectl describe sa default -n flytesnacks-development
gorgeous-caravan-46442
01/30/2025, 6:21 PMkubectl describe sa default -n flytesnacks-development
Name: default
Namespace: flytesnacks-development
Labels: <none>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<>:role/<>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
so annotations points to the flyte-workers-roleaverage-finland-92144
01/30/2025, 6:22 PMgorgeous-caravan-46442
01/30/2025, 6:24 PMgorgeous-caravan-46442
01/30/2025, 6:25 PMaverage-finland-92144
01/30/2025, 6:50 PMgorgeous-caravan-46442
01/30/2025, 7:19 PMgorgeous-caravan-46442
01/30/2025, 7:20 PMgorgeous-caravan-46442
01/30/2025, 7:29 PMkubectl describe sa --namespace flyte
Name: default
Namespace: flyte
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Name: flyte-admin
Namespace: flyte
Labels: <http://app.kubernetes.io/name=flyte-admin|app.kubernetes.io/name=flyte-admin>
aws.cdk.eks/prune-XXXXX=
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<>:role/<flyteworkersrol>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.14.1|helm.sh/chart=flyte-binary-v1.14.1>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
average-finland-92144
01/30/2025, 8:19 PMdefault
SA by defaultgorgeous-caravan-46442
01/30/2025, 9:39 PMgorgeous-caravan-46442
01/30/2025, 9:40 PMgorgeous-caravan-46442
01/30/2025, 10:07 PMthis.cluster = new Cluster(this, 'EKSCluster', {
...
albController: {
version: AlbControllerVersion.V2_8_2,
},
});
and then
kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 2/2 2 2 2m27s
pretty sweetgorgeous-caravan-46442
01/30/2025, 10:08 PMaverage-finland-92144
01/30/2025, 10:22 PMwhat are the other two SAs here doing?sorry I don't follow, what other SAs?
average-finland-92144
01/30/2025, 10:23 PMonce I've tidied up the CDK and if flyte still works, I'll post the code on your repo and we can go through it and discuss whether it's worth mergingthat'd be great, I can imagine EKS users would benefit from this a lot
gorgeous-caravan-46442
01/30/2025, 10:37 PMflyte
namespace and only default is being used?
Name: default
Namespace: flyte
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Name: flyte-admin
Namespace: flyte
Labels: <http://app.kubernetes.io/name=flyte-admin|app.kubernetes.io/name=flyte-admin>
aws.cdk.eks/prune-XXXXX=
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<>:role/<flyteworkersrol>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.14.1|helm.sh/chart=flyte-binary-v1.14.1>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
average-finland-92144
01/30/2025, 10:38 PMdefault
comes with K8s.
flyte-backend-flyte-binary
is what is used
flyte-admin
pretty sure your single binary is not even using thisgorgeous-caravan-46442
01/30/2025, 10:40 PMaverage-finland-92144
01/30/2025, 10:41 PMthe eks iam role attached to it is being used to run flyte pods?not sure how your Trust Relationship is configured but you can remove that one both from K8s and IAM and everything should still work
gorgeous-caravan-46442
01/30/2025, 10:41 PMgorgeous-caravan-46442
03/05/2025, 11:58 PMAn error occurred (ValidationError) when calling the AssumeRoleWithWebIdentity operation: Request ARN is invalid
points towards the OIDC trust relationship issue. I don't yet understand well-enough how the flyte internals, roles and service accounts work
I have CDK to create roles. I have CDK to create service accounts. Currently, I create the flyte-backend-flyte-binary role by setting create: true
in the yaml because earlier in this thread I couldn't find a way to do that outside of the helm chart without the deployment breaking – although there might still be a way.
Let me know what to try. Hopefully it's an obvious errorgorgeous-caravan-46442
03/05/2025, 11:59 PMgorgeous-caravan-46442
03/06/2025, 6:33 PMStringLike
) and then we actually don't run the part where the service account is created using KubernetesManifest
, i.e. these lines of CDK. Have I understood this correctly?