Hey ya ll I m currently looking into using Flyte as a soluti Flyte #flyte-support

Hey ya’ll! I’m currently looking into using Flyte...

fierce-answer-16379

01/26/2023, 6:14 PM

Hey ya’ll! I’m currently looking into using Flyte as a solution for our DS and DE team- I’ve got a quick question about running the Getting Started Demo. For some reason the first task,

generate_normal_df

, is getting stuck on

running

and has been

queued

for a while now. This was done using the local demo Flyte Cluster. Running it locally (

pyflyte run example.py wf --n 500 --mean 42 --sigma 2

) the workflow executes fine. However, running it on the Flyte Cluster (

pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2

) doesn’t seem to work properly on my end. Any ideas?

broad-monitor-993

01/26/2023, 6:28 PM

hi @fierce-answer-16379 welcome and glad you’re checking Flyte out! How much memory is allocated on your Docker daemon? You can check on the desktop app under Settings > Resources. If it’s low can you try bumping it up to 16 or 32 GB?

average-finland-92144

01/26/2023, 6:32 PM

@fierce-answer-16379 welcome to the Flyte community You can also check the current resource consumption of your local K8s node by issuing:

kubectl get node

and then

kubectl top node <insert-node-name>

and see if there's a bottleneck there

fierce-answer-16379

01/26/2023, 6:46 PM

@broad-monitor-993 Here’s my resource allocation. I just bumped it up to 4GB (i’m only on a 8GB RAM laptop atm). I restarted it and it’s still queued. Is it required to have atleast 16GB Ram to run this locally?

fierce-answer-16379

01/26/2023, 6:50 PM

@average-finland-92144 Here’s the output:

Copy code

NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
15ca09abf4e8   260m         13%    1467Mi          37%

What would be the typical values that would be alarming and show a possible bottleneck?

average-finland-92144

01/26/2023, 6:51 PM

something closer to the assigned resources, but here it seems that it's not hitting the limits

average-finland-92144

01/26/2023, 6:52 PM

are all the pods running just fine? eg

kubectl get po --namespace=flyte -o wide

fierce-answer-16379

01/26/2023, 6:56 PM

@average-finland-92144 They seem to be running okay?

average-finland-92144

01/26/2023, 7:02 PM

right, can we see logs for the flyte-binary pod? that would be

kubectl logs sandbox-flyte-binary-7757889f4-ztwk4 --namespace=flyte

fierce-answer-16379

01/26/2023, 7:05 PM

@average-finland-92144, It’s quite long so i directed the output to a text file:

local_k8_log.txt

👀 1

fierce-answer-16379

01/26/2023, 7:05 PM

(thank you very much for the help by the way! Super appreciated)

average-finland-92144

01/26/2023, 7:49 PM

so far, the only thing that catches my attention in the logs (and one I haven't been able to reproduce in my demo environment) is this:

Failed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\"<http://example.wf|example.wf>\" version:\"3JYuYTF4Gw0iPU97G0_TiQ==\" ]: Resource [{Project:flytesnacks Domain:development Workflow:<http://example.wf|example.wf> LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-01-26T18:53:44Z"}

Can you please share the output of:

flytectl get launchplan -d development -p flytesnacks

In case you need them, here the instructions to install flytectl

average-finland-92144

01/26/2023, 7:50 PM

or just a snapshot of the Launch Plans section in the UI

fierce-answer-16379

01/26/2023, 7:53 PM

@average-finland-92144 This? I went into launch plans ->

<http://example.wf|example.wf>

-> execution id of the running workflow.

broad-monitor-993

01/26/2023, 8:22 PM

@high-accountant-32689 @glamorous-carpet-83516 any ideas on why the Getting Started example might be hanging?

broad-monitor-993

01/26/2023, 8:22 PM

(see the log file here: https://flyte-org.slack.com/archives/CP2HDHKE1/p1674759923023239?thread_ts=1674756878.569859&cid=CP2HDHKE1)

average-finland-92144

01/26/2023, 8:28 PM

btw, following up with Choenden on DM, added the config environment variable @fierce-answer-16379 just let us know if now it works

high-accountant-32689

01/26/2023, 8:30 PM

@fierce-answer-16379, can you check the kubernetes logs for that task? If you do

kubectl -n flytesnacks-development get pods

you should see a pod named

ashbv8...

(the execution id) and from there can you dump the contents of that pod using

kubectl -n flytesnacks-development get pod -o yaml <podname>

. This will allow us to confirm what's happening from the k8s perspective.

fierce-answer-16379

01/26/2023, 8:54 PM

hmmm…after discussing with @average-finland-92144 and adding the config variable- i teared down the demo via:

flytectl demo teardown

. However, now when i

flytectl demo start

, i receive this error:

Copy code

$ flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
🧑‍🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
🐋 Going to use Flyte v1.3.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15>
🧑‍🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
context modified for "flyte-sandbox" and switched over to it.
+-----------------------------------+---------------+-----------+
|              SERVICE              |    STATUS     | NAMESPACE |
+-----------------------------------+---------------+-----------+
| k8s: This might take a little bit | Bootstrapping |           |
+-----------------------------------+---------------+-----------+
Error: Get "<https://127.0.0.1:6443/api/v1/nodes>": dial tcp 127.0.0.1:6443: connect: connection refused
{"json":{},"level":"error","msg":"Get \"<https://127.0.0.1:6443/api/v1/nodes>\": dial tcp 127.0.0.1:6443: connect: connection refused","ts":"2023-01-26T12:51:05-08:00"}

fierce-answer-16379

01/26/2023, 8:57 PM

@high-accountant-32689, prior to you sending that message i had already tore down the demo! Those commands can’t dump the contents of that pod anymore (i think?)

high-accountant-32689

01/26/2023, 9:24 PM

that's true, if you tore down the cluster they are gone. As for this error that you're seeing now... does it take long (as in multiple minutes) for it to show up?

fierce-answer-16379

01/26/2023, 9:36 PM

@high-accountant-32689 up and running now- still queued as before. Here is the output from:

kubectl -n flytesnacks-development get pods

Copy code

NAME                        READY   STATUS    RESTARTS   AGE
f2009af361c7d4739b15-n0-0   0/1     Pending   0          4m22s

And i’ve directed the output of

kubectl -n flytesnacks-development get pod -o yaml f2009af361c7d4739b15-n0-0

to the attached txt file.

log.txt

high-accountant-32689

01/26/2023, 9:37 PM

very interesting. The imporant bit hit is:

Copy code

message: '0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are
      available: 1 No preemption victims found for incoming pod.'

fierce-answer-16379

01/26/2023, 9:43 PM

hmm yeah. i dont have much hands on experience with k8's- just some of the basic knowledge- trying to learn as i go atm with flyte. Is this a resource issue due to my local?

high-accountant-32689

01/26/2023, 9:43 PM

tell us a bit more about your local system. Are you running on a mac? Is it an M1? Linux?

fierce-answer-16379

01/26/2023, 9:44 PM

Mac, Intel, 8GB RAM

high-accountant-32689

01/26/2023, 9:50 PM

ok, let's double-check resources on the node. Can you run

kubectl get nodes

and then

kubectl describe node <node-name>

fierce-answer-16379

01/26/2023, 9:51 PM

yup!

node_info.txt

high-accountant-32689

01/26/2023, 10:23 PM

yeah... 2 CPUs is not enough since the node uses ~1.5 CPU

high-accountant-32689

01/26/2023, 10:25 PM

let me see if there's k3s logs that show this more clearly.

fierce-answer-16379

01/26/2023, 10:42 PM

@high-accountant-32689 I see….okay

fierce-answer-16379

01/27/2023, 12:24 AM

@high-accountant-32689 @average-finland-92144 Increased the CPU to 4 and Memory to 3.00 GB in my Docker resource. The workflow succeeded.

🎉 2

high-accountant-32689

01/27/2023, 12:25 AM

Alright. We should document this, @broad-monitor-993 / @average-finland-92144.

👍🏽 1

👍 1

fierce-answer-16379

01/27/2023, 12:26 AM

fyi the task:

generate_normal_df

ran in 2m 23s. I’m using an older laptop atm (macbook pro 2015) and won’t be doing so for this usually- but yeah.

169 Views

Open in Slack

Previous Next