flat-exabyte-79377
08/15/2023, 5:52 PMflat-exabyte-79377
08/15/2023, 5:53 PMflyteadmin
and flytepropeller
freezing-airport-6809
faint-activity-87590
08/16/2023, 8:40 AMflat-exabyte-79377
08/16/2023, 9:23 AMfaint-activity-87590
08/16/2023, 9:25 AMfreezing-airport-6809
flat-exabyte-79377
08/16/2023, 1:24 PMcalm-zoo-68637
08/16/2023, 4:49 PMcalm-zoo-68637
08/16/2023, 4:49 PMcalm-pilot-2010
08/31/2023, 9:39 AMsync-cluster-resources (init)
init container of the flyteadmin
pod. It kept failing to connect to my data plane only cluster but that is workign now.
Now the main flyteadmin container is failing with
caught panic: entries is empty [goroutine 1 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
<http://github.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer.func1()|github.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer.func1()>
/go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/base.go:75 +0x88
panic({0x237d3c0, 0xc000b658c0})
/usr/local/go/src/runtime/panic.go:884 +0x212
<http://github.com/flyteorg/flyteadmin/pkg/executioncluster/impl.GetExecutionCluster(|github.com/flyteorg/flyteadmin/pkg/executioncluster/impl.GetExecutionCluster(>{0x2d78ac8?, 0xc0006062b0?}, {0x0, 0x0}, {0x0, 0x0}, {0x2d6af08, 0xc00034c0a0}, {0x2d6f108, 0xc001686600})
/go/src/github.com/flyteorg/flyteadmin/pkg/executioncluster/impl/factory.go:28 +0x159
<http://github.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer({0x2d5efd0|github.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer({0x2d5efd0>?, 0xc000114000}, 0xc0005a9950, {0x2d6af08, 0xc00034c0a0}, {0x0, 0x0}, {0x0, 0x0}, 0xc000593b00, ...)
/go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/base.go:89 +0x376
<http://github.com/flyteorg/flyteadmin/pkg/server.newGRPCServer({0x2d5efd0|github.com/flyteorg/flyteadmin/pkg/server.newGRPCServer({0x2d5efd0>, 0xc000114000}, 0xc0005a9950, 0xc00019c000, 0x0?, {0x0?, 0x0}, {0x2d78ac8, 0xc000ec1f00}, {0x0, ...})
/go/src/github.com/flyteorg/flyteadmin/pkg/server/service.go:116 +0x6d9
<http://github.com/flyteorg/flyteadmin/pkg/server.serveGatewayInsecure(|github.com/flyteorg/flyteadmin/pkg/server.serveGatewayInsecure(>{0x2d5efd0?, 0xc000114000}, 0xc000ec1e10?, 0xc00019c000, 0xc0001c4680, 0x7fe3271ce108?, 0x9?, {0x2d78ac8, 0xc000ec1f00})
/go/src/github.com/flyteorg/flyteadmin/pkg/server/service.go:319 +0x705
<http://github.com/flyteorg/flyteadmin/pkg/server.Serve({0x2d5efd0|github.com/flyteorg/flyteadmin/pkg/server.Serve({0x2d5efd0>, 0xc000114000}, 0x4?, 0x4?)
/go/src/github.com/flyteorg/flyteadmin/pkg/server/service.go:59 +0x19f
<http://github.com/flyteorg/flyteadmin/cmd/entrypoints.glob..func7(0x41e0f20|github.com/flyteorg/flyteadmin/cmd/entrypoints.glob..func7(0x41e0f20>?, {0x27d0389?, 0x2?, 0x2?})
/go/src/github.com/flyteorg/flyteadmin/cmd/entrypoints/serve.go:39 +0x128
<http://github.com/spf13/cobra.(*Command).execute(0x41e0f20|github.com/spf13/cobra.(*Command).execute(0x41e0f20>, {0xc00058d5c0, 0x2, 0x2})
/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
<http://github.com/spf13/cobra.(*Command).ExecuteC(0x41e20a0)|github.com/spf13/cobra.(*Command).ExecuteC(0x41e20a0)>
/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
<http://github.com/spf13/cobra.(*Command).Execute(...)|github.com/spf13/cobra.(*Command).Execute(...)>
/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
<http://github.com/flyteorg/flyteadmin/cmd/entrypoints.Execute(0x60?)|github.com/flyteorg/flyteadmin/cmd/entrypoints.Execute(0x60?)>
/go/src/github.com/flyteorg/flyteadmin/cmd/entrypoints/root.go:50 +0x3a
main.main()
/go/src/github.com/flyteorg/flyteadmin/cmd/main.go:11 +0x85
]","ts":"2023-08-31T10:08:35Z
Has anyone seen this before in multi-cluster setups? I'm using latest helm release (1.9.1) and fltyadmin 1.1.123.calm-pilot-2010
08/31/2023, 10:34 AMenabled: true
is important. It might be nice to add this to the example helm values so that others don't make the same mistake as me.calm-pilot-2010
09/04/2023, 6:46 PMcluster_resource_manager
(both the init container on flyteadmin and the separate deployment).
The multi-cluster setup requires mounting a couple of secrets for authenticating to the kube api of the other k8s clusters. The docs explain using additionalVolumes
and additionalVolumeMounts
https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#user-and-control-plane-deployment. However these configs don't impact the resource sync deployment or the resource sync init container, so I got lots of errors about failing to find secrets on those components. Am I right in thinking that cluster_resource_manager
is not supported on multi-cluster deployments?
As far as I can tell the cluster_resource_manager
is just responsible for creating k8s namespaces and applying namespace resource quotas. If that's true then personally I'm happy managing that myself in terraform.freezing-airport-6809
average-finland-92144
09/04/2023, 9:24 PMcalm-pilot-2010
09/04/2023, 10:01 PMcluster_resource_manager
due to secret mounting issues.
2. I think the service account bearer token
and ca.crt
it refers to are not created by autoamtically since k8s v1.22 https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#manual-secret-management-for-serviceaccounts. I had to create this manually.
3. You need to configure the flyteadmin endpoint on the data planes so they can communicate back to flyteadmin from the control plane. I did this with configmap.admin.admin.endpoint
etc.
4. enabled: true
is missing from configmap.clusters.clusterConfigs
in the example at https://github.com/flyteorg/flyte/blob/44f5f42b3e5a75747932a6d76e8fc8fef625f3d2/charts/flyte-core/values.yaml.flat-exabyte-79377
09/05/2023, 12:17 PMAm I right in thinking thatis not supported on multi-cluster deployments?cluster_resource_manager
As far as I can tell theWe've had to disable theis just responsible for creating k8s namespaces and applying namespace resource quotas. If that's true then personally I'm happy managing that myself in terraform.cluster_resource_manager
cluster_resource_manager
and manage namespaces and service accounts manually via terraform for the momentfreezing-airport-6809
faint-activity-87590
09/05/2023, 1:35 PMThe multi-cluster setup requires mounting a couple of secrets for authenticating to the kube api of the other k8s clusters. The docs explain usingYes thats correct. We also did configuration on the flyteadmin onandadditionalVolumes
additionalVolumeMounts..
additionalVolumes
, additionalVolumeMounts
and initContainerClusterSyncAdditionalVolumeMounts
like this:
additionalVolumes:
- name: cluster-credentials
secret:
secretName: cluster-credentials
additionalVolumeMounts:
- name: cluster-credentials
mountPath: /etc/credentials
initContainerClusterSyncAdditionalVolumeMounts:
- name: cluster-credentials
mountPath: /etc/credentials
Like Tom already mentioned this alone does not impact the failing deployments. This is why adjusted the deployments for flyteadmin and cluster resource manager. After helm install of the control plane, creating an empty secret cluster-credentials
should turn everything healthy.
Will paste the adjusted deployments below ⬇️. Search for Kineo Change
faint-activity-87590
09/05/2023, 1:39 PMaverage-finland-92144
10/04/2023, 10:49 PMclusterconfigs
. I find that the clusterResourceManager
and secrets mounting works just well with multicluster after this PR. Sometimes, though, flyteadmin
doesn't reload the configmap after helm upgrade
operations, forcing to do a rollout restart to make it load the new config. If you find this behavior consistent, please create an Issue to explore it better.calm-pilot-2010
10/05/2023, 8:35 AM