Hi community, I'm getting an `UNKNOWN` status on e...
# ask-the-community
g
Hi community, I'm getting an
UNKNOWN
status on every workflow that I submit to Flyte and it just stays in that state (it never evolved to a
RUNNING
state). Some background of the Flyte installation: I have deployed Flyte on a local K8's before deploying it on our real K8's environment (sort of a POC). I have recently installed the MPI-Operator in order to be able to parallelize a ML workflow. Since I couldn't make an update of the Helm Chart because it was throwing the following error
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: Secret "kubernetes-dashboard-csrf" in namespace "flyte" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "<http://meta.helm.sh/release-name|meta.helm.sh/release-name>" must equal "flyte": current value is "flyte-deps"
I ended up modifying the flyte-core values file, adding to the
ConfigMap
property the
enabled_plugins
property in accordance to what the documentation says. Question: What could be happening and how could I check what's going on under the hood? BTW, from time to time.. it's not weird to get a
503
error when navigating the console. Any help is greatly appreciated, thx!
j
check the flytepropeller log, there should be some info on why its not going forward
g
Great, thx! I'll check that 馃槃
Hmm is it normal for that to be empty? Im checking the container logs of
k8s_POD_flytepropeller-5f9c6c6b77-p8btg_flyte_a591b5af-c747-4218-b868-bc553bfb75cf_0
from docker desktop
y
it shouldn鈥檛 be empty.
can you get the statuses on all the pods?
kubectl -n flyte get pod
g
Checking it right now
Oh god, im running late to a meeting. I'll check what's going on as soon as I get back and post my findings here! Thx! 馃檶
Hi everyone! Sorry for the delay, the meeting ended up too late. So, apparently there are two instances (?) of Flytepropeller, im guessing that one is the deployment (the one that has empty logs and still "running" and the other should be the pod that is trying to instantiate. Anyhow, from the pod that is crashing I managed to retrieve the following logs:
Copy code
{"json":{},"level":"panic","msg":"cannot set default plugin [agent-service] for task types [[bigquery_query_job_task]] when it is not configured to be an enabled plugin","ts":"2023-08-07T11:41:07Z"}
panic: (*logrus.Entry) 0x4000311b90

goroutine 536 [running]:
<http://github.com/sirupsen/logrus.(*Entry).log(0x4000311b20|github.com/sirupsen/logrus.(*Entry).log(0x4000311b20>, 0x0, {0x4000182cf0, 0x86})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x470
<http://github.com/sirupsen/logrus.(*Entry).Log(0x4000311b20|github.com/sirupsen/logrus.(*Entry).Log(0x4000311b20>, 0x0, {0x4000ee37b0?, 0x4000ee37b0?, 0x0?})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x60
<http://github.com/sirupsen/logrus.(*Entry).Panic(0x2663f60|github.com/sirupsen/logrus.(*Entry).Panic(0x2663f60>?, {0x4000ee37b0?, 0x4000ec3f01?, 0x4000ec3f01?})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:331 +0x30
<http://github.com/flyteorg/flytestdlib/logger.Panic({0x2663f60|github.com/flyteorg/flytestdlib/logger.Panic({0x2663f60>?, 0x4000b82200?}, {0x4000ee37b0, 0x1, 0x1})
        /go/pkg/mod/github.com/flyteorg/flytestdlib@v1.0.19/logger/logger.go:143 +0x4c
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Controller).onStartedLeading.func1()|github.com/flyteorg/flytepropeller/pkg/controller.(*Controller).onStartedLeading.func1()>
        /go/src/github.com/flyteorg/flytepropeller/pkg/controller/controller.go:139 +0xa0
created by <http://github.com/flyteorg/flytepropeller/pkg/controller.(*Controller).onStartedLeading|github.com/flyteorg/flytepropeller/pkg/controller.(*Controller).onStartedLeading>
        /go/src/github.com/flyteorg/flytepropeller/pkg/controller/controller.go:137 +0xd4
I'm not quite sure why it has detected a big_query plugin and neither know how to enable it, in case that's something I should do. As always, more than glad to hear your thoughts about this and thx in advance 馃檶
Okey, so I managed to fix the above error adding the agent service as an enabled-plugin but now im getting the following error:
Copy code
failed to load plugin - mpi: [PluginInitializationFailed] Error getting informer for %!s(\u003cnil\u003e), caused by: no matches for kind \"MPIJob\" in version \"<http://kubeflow.org/v1\|kubeflow.org/v1\>
I'll try search on the forums for a fix for this