Hello I ve been working with simple hello world pyflyte work Flyte #flyte-support

Hello, I've been working with simple "hello world"...

quaint-army-56981

03/09/2023, 11:21 PM

Hello, I've been working with simple "hello world" pyflyte workflow on a Kubernetes cluster and I've noticed that a task is taking over 90 seconds to run, which is much longer than expected. I'm using the default flyte image on an Intel Mac and running Kubernetes through Docker Desktop. Strangely, running the same flyte workflow locally (i.e., without Kubernetes) only takes a few seconds. Do you have any insights into what could be causing this delay?

thankful-minister-83577

03/10/2023, 12:57 AM

could you add

FLYTE_SDK_LOGGING_LEVEL=10

as an environment variable to the task and re-run please?

thankful-minister-83577

03/10/2023, 12:59 AM

want some more timestamps

thankful-minister-83577

03/10/2023, 12:59 AM

that would be helpful for debugging

freezing-airport-6809

03/10/2023, 2:32 PM

Cc @hallowed-mouse-14616

hallowed-mouse-14616

03/10/2023, 3:59 PM

@quaint-army-56981 in addition to Yee's recommendation, which will highlight performance inside the container, it would be good to see the

Pod

breakdown. The output of

kubectl -n <namespace> get pod <pod-name> -o yaml

would be very helpful to breakdown pod scheduling, container pulling, etc. These are all very common performance issues.

quaint-army-56981

03/10/2023, 5:58 PM

Here the logs and pod config

ff7682813274646d3988-hello-sue-0.yaml ff7682813274646d3988-hello-sue-0.log

quaint-army-56981

03/10/2023, 6:00 PM

The task took 1m31sec to complete:

hallowed-mouse-14616

03/10/2023, 6:14 PM

So it looks like the Flyte orchestartion overhead is relatively small (order of seconds) as we would expect. From the pod dump, the container is taking a long time (~1m23s):

Copy code

state:
  terminated:
    containerID: <docker://ab32398a800e951520d479878ccaca4a29782f4b116e5bdf17958cd2fb1bb67>b
    exitCode: 0
    finishedAt: "2023-03-10T17:50:56Z"
    reason: Completed
    startedAt: "2023-03-10T17:49:33Z"

This could be for a number of reasons, the first thing to do is increase the resource requests - currently it looks like

100m

CPU:

Copy code

resources:
  limits:
    cpu: 100m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 1Gi

I'm assuming this is using fast register (ie.

pyflyte run --remote ...

) which requires downloading and decompressing the code at runtime. This can be a little bit more resource intensive.

hallowed-mouse-14616

03/10/2023, 6:15 PM

This should speed up quite a bit, but I'm suspected a further look into the flytekit logs will show some more improvements.

freezing-airport-6809

03/10/2023, 9:28 PM

thank you for the dig up

quaint-army-56981

03/11/2023, 12:13 AM

Seems like bumping the CPU request limit did the trick. I tried a couple different values and got the following:

quaint-army-56981

03/11/2023, 12:14 AM

Is there a way to set default limits? I'm currently having to set it per task. So something like:

Copy code

cpu_request = "1000m"
@task(
    limits=Resources(mem="1500Mi", cpu=cpu_request),
    requests=Resources(cpu=cpu_request),
    environment={"FLYTE_SDK_LOGGING_LEVEL": "10"},
)
def hello_world(name: str) -> str:
    print(f"Hello {name}")
    return name

freezing-airport-6809

03/11/2023, 12:20 AM

ya you can update the global defaults in flyteadmin config

hallowed-mouse-14616

03/11/2023, 1:17 AM

@quaint-army-56981 so here is the flyteadmin configuration value for task defaults. An example of how this is set in the helm charts is provided here. Let us know if you run into any issues!

159 Views

Open in Slack

Previous Next