Choenden Kyirong
01/26/2023, 6:14 PMgenerate_normal_df
, is getting stuck on running
and has been queued
for a while now. This was done using the local demo Flyte Cluster.
Running it locally (pyflyte run example.py wf --n 500 --mean 42 --sigma 2
) the workflow executes fine. However, running it on the Flyte Cluster (pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
) doesn’t seem to work properly on my end. Any ideas?Niels Bantilan
01/26/2023, 6:28 PMDavid Espejo (he/him)
01/26/2023, 6:32 PMkubectl get node
and then
kubectl top node <insert-node-name>
and see if there's a bottleneck thereChoenden Kyirong
01/26/2023, 6:46 PMNAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
15ca09abf4e8 260m 13% 1467Mi 37%
What would be the typical values that would be alarming and show a possible bottleneck?David Espejo (he/him)
01/26/2023, 6:51 PMkubectl get po --namespace=flyte -o wide
Choenden Kyirong
01/26/2023, 6:56 PMDavid Espejo (he/him)
01/26/2023, 7:02 PMkubectl logs sandbox-flyte-binary-7757889f4-ztwk4 --namespace=flyte
Choenden Kyirong
01/26/2023, 7:05 PMDavid Espejo (he/him)
01/26/2023, 7:49 PMFailed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\"<http://example.wf|example.wf>\" version:\"3JYuYTF4Gw0iPU97G0_TiQ==\" ]: Resource [{Project:flytesnacks Domain:development Workflow:<http://example.wf|example.wf> LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-01-26T18:53:44Z"}
Can you please share the output of:
flytectl get launchplan -d development -p flytesnacks
In case you need them, here the instructions to install flytectlChoenden Kyirong
01/26/2023, 7:53 PM<http://example.wf|example.wf>
-> execution id of the running workflow.Niels Bantilan
01/26/2023, 8:22 PMDavid Espejo (he/him)
01/26/2023, 8:28 PMEduardo Apolinario (eapolinario)
01/26/2023, 8:30 PMkubectl -n flytesnacks-development get pods
you should see a pod named ashbv8...
(the execution id) and from there can you dump the contents of that pod using kubectl -n flytesnacks-development get pod -o yaml <podname>
. This will allow us to confirm what's happening from the k8s perspective.Choenden Kyirong
01/26/2023, 8:54 PMflytectl demo teardown
. However, now when i flytectl demo start
, i receive this error:
$ flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
🧑🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
🐋 Going to use Flyte v1.3.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🧑🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
context modified for "flyte-sandbox" and switched over to it.
+-----------------------------------+---------------+-----------+
| SERVICE | STATUS | NAMESPACE |
+-----------------------------------+---------------+-----------+
| k8s: This might take a little bit | Bootstrapping | |
+-----------------------------------+---------------+-----------+
Error: Get "<https://127.0.0.1:6443/api/v1/nodes>": dial tcp 127.0.0.1:6443: connect: connection refused
{"json":{},"level":"error","msg":"Get \"<https://127.0.0.1:6443/api/v1/nodes>\": dial tcp 127.0.0.1:6443: connect: connection refused","ts":"2023-01-26T12:51:05-08:00"}
Eduardo Apolinario (eapolinario)
01/26/2023, 9:24 PMChoenden Kyirong
01/26/2023, 9:36 PMkubectl -n flytesnacks-development get pods
:
NAME READY STATUS RESTARTS AGE
f2009af361c7d4739b15-n0-0 0/1 Pending 0 4m22s
And i’ve directed the output of kubectl -n flytesnacks-development get pod -o yaml f2009af361c7d4739b15-n0-0
to the attached txt file.Eduardo Apolinario (eapolinario)
01/26/2023, 9:37 PMmessage: '0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are
available: 1 No preemption victims found for incoming pod.'
Choenden Kyirong
01/26/2023, 9:43 PMEduardo Apolinario (eapolinario)
01/26/2023, 9:43 PMChoenden Kyirong
01/26/2023, 9:44 PMEduardo Apolinario (eapolinario)
01/26/2023, 9:50 PMkubectl get nodes
and then kubectl describe node <node-name>
?Choenden Kyirong
01/26/2023, 9:51 PMEduardo Apolinario (eapolinario)
01/26/2023, 10:23 PMChoenden Kyirong
01/26/2023, 10:42 PMEduardo Apolinario (eapolinario)
01/27/2023, 12:25 AMChoenden Kyirong
01/27/2023, 12:26 AMgenerate_normal_df
ran in 2m 23s.
I’m using an older laptop atm (macbook pro 2015) and won’t be doing so for this usually- but yeah.