fierce-answer-16379
01/26/2023, 6:14 PMgenerate_normal_df
, is getting stuck on running
and has been queued
for a while now. This was done using the local demo Flyte Cluster.
Running it locally (pyflyte run example.py wf --n 500 --mean 42 --sigma 2
) the workflow executes fine. However, running it on the Flyte Cluster (pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
) doesn’t seem to work properly on my end. Any ideas?broad-monitor-993
01/26/2023, 6:28 PMaverage-finland-92144
01/26/2023, 6:32 PMkubectl get node
and then
kubectl top node <insert-node-name>
and see if there's a bottleneck therefierce-answer-16379
01/26/2023, 6:46 PMfierce-answer-16379
01/26/2023, 6:50 PMNAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
15ca09abf4e8 260m 13% 1467Mi 37%
What would be the typical values that would be alarming and show a possible bottleneck?average-finland-92144
01/26/2023, 6:51 PMaverage-finland-92144
01/26/2023, 6:52 PMkubectl get po --namespace=flyte -o wide
fierce-answer-16379
01/26/2023, 6:56 PMaverage-finland-92144
01/26/2023, 7:02 PMkubectl logs sandbox-flyte-binary-7757889f4-ztwk4 --namespace=flyte
fierce-answer-16379
01/26/2023, 7:05 PMfierce-answer-16379
01/26/2023, 7:05 PMaverage-finland-92144
01/26/2023, 7:49 PMFailed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\"<http://example.wf|example.wf>\" version:\"3JYuYTF4Gw0iPU97G0_TiQ==\" ]: Resource [{Project:flytesnacks Domain:development Workflow:<http://example.wf|example.wf> LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-01-26T18:53:44Z"}
Can you please share the output of:
flytectl get launchplan -d development -p flytesnacks
In case you need them, here the instructions to install flytectlaverage-finland-92144
01/26/2023, 7:50 PMfierce-answer-16379
01/26/2023, 7:53 PM<http://example.wf|example.wf>
-> execution id of the running workflow.broad-monitor-993
01/26/2023, 8:22 PMbroad-monitor-993
01/26/2023, 8:22 PMaverage-finland-92144
01/26/2023, 8:28 PMhigh-accountant-32689
01/26/2023, 8:30 PMkubectl -n flytesnacks-development get pods
you should see a pod named ashbv8...
(the execution id) and from there can you dump the contents of that pod using kubectl -n flytesnacks-development get pod -o yaml <podname>
. This will allow us to confirm what's happening from the k8s perspective.fierce-answer-16379
01/26/2023, 8:54 PMflytectl demo teardown
. However, now when i flytectl demo start
, i receive this error:
$ flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
🧑🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
🐋 Going to use Flyte v1.3.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🧑🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
context modified for "flyte-sandbox" and switched over to it.
+-----------------------------------+---------------+-----------+
| SERVICE | STATUS | NAMESPACE |
+-----------------------------------+---------------+-----------+
| k8s: This might take a little bit | Bootstrapping | |
+-----------------------------------+---------------+-----------+
Error: Get "<https://127.0.0.1:6443/api/v1/nodes>": dial tcp 127.0.0.1:6443: connect: connection refused
{"json":{},"level":"error","msg":"Get \"<https://127.0.0.1:6443/api/v1/nodes>\": dial tcp 127.0.0.1:6443: connect: connection refused","ts":"2023-01-26T12:51:05-08:00"}
fierce-answer-16379
01/26/2023, 8:57 PMhigh-accountant-32689
01/26/2023, 9:24 PMfierce-answer-16379
01/26/2023, 9:36 PMkubectl -n flytesnacks-development get pods
:
NAME READY STATUS RESTARTS AGE
f2009af361c7d4739b15-n0-0 0/1 Pending 0 4m22s
And i’ve directed the output of kubectl -n flytesnacks-development get pod -o yaml f2009af361c7d4739b15-n0-0
to the attached txt file.high-accountant-32689
01/26/2023, 9:37 PMmessage: '0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are
available: 1 No preemption victims found for incoming pod.'
fierce-answer-16379
01/26/2023, 9:43 PMhigh-accountant-32689
01/26/2023, 9:43 PMfierce-answer-16379
01/26/2023, 9:44 PMhigh-accountant-32689
01/26/2023, 9:50 PMkubectl get nodes
and then kubectl describe node <node-name>
?fierce-answer-16379
01/26/2023, 9:51 PMhigh-accountant-32689
01/26/2023, 10:23 PMhigh-accountant-32689
01/26/2023, 10:25 PMfierce-answer-16379
01/26/2023, 10:42 PMfierce-answer-16379
01/27/2023, 12:24 AMhigh-accountant-32689
01/27/2023, 12:25 AMfierce-answer-16379
01/27/2023, 12:26 AMgenerate_normal_df
ran in 2m 23s.
I’m using an older laptop atm (macbook pro 2015) and won’t be doing so for this usually- but yeah.