https://flyte.org logo
#ask-the-community
Title
# ask-the-community
c

Choenden Kyirong

01/26/2023, 6:14 PM
Hey ya’ll! I’m currently looking into using Flyte as a solution for our DS and DE team- I’ve got a quick question about running the Getting Started Demo. For some reason the first task,
generate_normal_df
, is getting stuck on
running
and has been
queued
for a while now. This was done using the local demo Flyte Cluster. Running it locally (
pyflyte run example.py wf --n 500 --mean 42 --sigma 2
) the workflow executes fine. However, running it on the Flyte Cluster (
pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
) doesn’t seem to work properly on my end. Any ideas?
n

Niels Bantilan

01/26/2023, 6:28 PM
hi @Choenden Kyirong welcome and glad you’re checking Flyte out! How much memory is allocated on your Docker daemon? You can check on the desktop app under Settings > Resources. If it’s low can you try bumping it up to 16 or 32 GB?
d

David Espejo (he/him)

01/26/2023, 6:32 PM
@Choenden Kyirong welcome to the Flyte community You can also check the current resource consumption of your local K8s node by issuing:
kubectl get node
and then
kubectl top node <insert-node-name>
and see if there's a bottleneck there
c

Choenden Kyirong

01/26/2023, 6:46 PM
@Niels Bantilan Here’s my resource allocation. I just bumped it up to 4GB (i’m only on a 8GB RAM laptop atm). I restarted it and it’s still queued. Is it required to have atleast 16GB Ram to run this locally?
@David Espejo (he/him) Here’s the output:
Copy code
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
15ca09abf4e8   260m         13%    1467Mi          37%
What would be the typical values that would be alarming and show a possible bottleneck?
d

David Espejo (he/him)

01/26/2023, 6:51 PM
something closer to the assigned resources, but here it seems that it's not hitting the limits
are all the pods running just fine? eg
kubectl get po --namespace=flyte -o wide
c

Choenden Kyirong

01/26/2023, 6:56 PM
@David Espejo (he/him) They seem to be running okay?
d

David Espejo (he/him)

01/26/2023, 7:02 PM
right, can we see logs for the flyte-binary pod? that would be
kubectl logs sandbox-flyte-binary-7757889f4-ztwk4 --namespace=flyte
c

Choenden Kyirong

01/26/2023, 7:05 PM
@David Espejo (he/him), It’s quite long so i directed the output to a text file:
(thank you very much for the help by the way! Super appreciated)
d

David Espejo (he/him)

01/26/2023, 7:49 PM
so far, the only thing that catches my attention in the logs (and one I haven't been able to reproduce in my demo environment) is this:
Failed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\"<http://example.wf|example.wf>\" version:\"3JYuYTF4Gw0iPU97G0_TiQ==\" ]: Resource [{Project:flytesnacks Domain:development Workflow:<http://example.wf|example.wf> LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-01-26T18:53:44Z"}
Can you please share the output of:
flytectl get launchplan -d development -p flytesnacks
In case you need them, here the instructions to install flytectl
or just a snapshot of the Launch Plans section in the UI
c

Choenden Kyirong

01/26/2023, 7:53 PM
@David Espejo (he/him) This? I went into launch plans ->
<http://example.wf|example.wf>
-> execution id of the running workflow.
n

Niels Bantilan

01/26/2023, 8:22 PM
@Eduardo Apolinario (eapolinario) @Kevin Su any ideas on why the Getting Started example might be hanging?
d

David Espejo (he/him)

01/26/2023, 8:28 PM
btw, following up with Choenden on DM, added the config environment variable @Choenden Kyirong just let us know if now it works
e

Eduardo Apolinario (eapolinario)

01/26/2023, 8:30 PM
@Choenden Kyirong, can you check the kubernetes logs for that task? If you do
kubectl -n flytesnacks-development get pods
you should see a pod named
ashbv8...
(the execution id) and from there can you dump the contents of that pod using
kubectl -n flytesnacks-development get pod -o yaml <podname>
. This will allow us to confirm what's happening from the k8s perspective.
c

Choenden Kyirong

01/26/2023, 8:54 PM
hmmm…after discussing with @David Espejo (he/him) and adding the config variable- i teared down the demo via:
flytectl demo teardown
. However, now when i
flytectl demo start
, i receive this error:
Copy code
$ flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
🧑‍🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
🐋 Going to use Flyte v1.3.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🧑‍🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
context modified for "flyte-sandbox" and switched over to it.
+-----------------------------------+---------------+-----------+
|              SERVICE              |    STATUS     | NAMESPACE |
+-----------------------------------+---------------+-----------+
| k8s: This might take a little bit | Bootstrapping |           |
+-----------------------------------+---------------+-----------+
Error: Get "<https://127.0.0.1:6443/api/v1/nodes>": dial tcp 127.0.0.1:6443: connect: connection refused
{"json":{},"level":"error","msg":"Get \"<https://127.0.0.1:6443/api/v1/nodes>\": dial tcp 127.0.0.1:6443: connect: connection refused","ts":"2023-01-26T12:51:05-08:00"}
@Eduardo Apolinario (eapolinario), prior to you sending that message i had already tore down the demo! Those commands can’t dump the contents of that pod anymore (i think?)
e

Eduardo Apolinario (eapolinario)

01/26/2023, 9:24 PM
that's true, if you tore down the cluster they are gone. As for this error that you're seeing now... does it take long (as in multiple minutes) for it to show up?
c

Choenden Kyirong

01/26/2023, 9:36 PM
@Eduardo Apolinario (eapolinario) up and running now- still queued as before. Here is the output from:
kubectl -n flytesnacks-development get pods
:
Copy code
NAME                        READY   STATUS    RESTARTS   AGE
f2009af361c7d4739b15-n0-0   0/1     Pending   0          4m22s
And i’ve directed the output of
kubectl -n flytesnacks-development get pod -o yaml f2009af361c7d4739b15-n0-0
to the attached txt file.
e

Eduardo Apolinario (eapolinario)

01/26/2023, 9:37 PM
very interesting. The imporant bit hit is:
Copy code
message: '0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are
      available: 1 No preemption victims found for incoming pod.'
c

Choenden Kyirong

01/26/2023, 9:43 PM
hmm yeah. i dont have much hands on experience with k8's- just some of the basic knowledge- trying to learn as i go atm with flyte. Is this a resource issue due to my local?
e

Eduardo Apolinario (eapolinario)

01/26/2023, 9:43 PM
tell us a bit more about your local system. Are you running on a mac? Is it an M1? Linux?
c

Choenden Kyirong

01/26/2023, 9:44 PM
Mac, Intel, 8GB RAM
e

Eduardo Apolinario (eapolinario)

01/26/2023, 9:50 PM
ok, let's double-check resources on the node. Can you run
kubectl get nodes
and then
kubectl describe node <node-name>
?
c

Choenden Kyirong

01/26/2023, 9:51 PM
e

Eduardo Apolinario (eapolinario)

01/26/2023, 10:23 PM
yeah... 2 CPUs is not enough since the node uses ~1.5 CPU
let me see if there's k3s logs that show this more clearly.
c

Choenden Kyirong

01/26/2023, 10:42 PM
@Eduardo Apolinario (eapolinario) I see….okay
@Eduardo Apolinario (eapolinario) @David Espejo (he/him) Increased the CPU to 4 and Memory to 3.00 GB in my Docker resource. The workflow succeeded.
e

Eduardo Apolinario (eapolinario)

01/27/2023, 12:25 AM
Alright. We should document this, @Niels Bantilan / @David Espejo (he/him).
c

Choenden Kyirong

01/27/2023, 12:26 AM
fyi the task:
generate_normal_df
ran in 2m 23s. I’m using an older laptop atm (macbook pro 2015) and won’t be doing so for this usually- but yeah.
80 Views