Hello, I've been working with simple "hello world"...
# ask-the-community
j
Hello, I've been working with simple "hello world" pyflyte workflow on a Kubernetes cluster and I've noticed that a task is taking over 90 seconds to run, which is much longer than expected. I'm using the default flyte image on an Intel Mac and running Kubernetes through Docker Desktop. Strangely, running the same flyte workflow locally (i.e., without Kubernetes) only takes a few seconds. Do you have any insights into what could be causing this delay?
y
could you add
FLYTE_SDK_LOGGING_LEVEL=10
as an environment variable to the task and re-run please?
want some more timestamps
that would be helpful for debugging
k
Cc @Dan Rammer (hamersaw)
d
@joe in addition to Yee's recommendation, which will highlight performance inside the container, it would be good to see the
Pod
breakdown. The output of
kubectl -n <namespace> get pod <pod-name> -o yaml
would be very helpful to breakdown pod scheduling, container pulling, etc. These are all very common performance issues.
j
The task took 1m31sec to complete:
d
So it looks like the Flyte orchestartion overhead is relatively small (order of seconds) as we would expect. From the pod dump, the container is taking a long time (~1m23s):
Copy code
state:
  terminated:
    containerID: <docker://ab32398a800e951520d479878ccaca4a29782f4b116e5bdf17958cd2fb1bb67>b
    exitCode: 0
    finishedAt: "2023-03-10T17:50:56Z"
    reason: Completed
    startedAt: "2023-03-10T17:49:33Z"
This could be for a number of reasons, the first thing to do is increase the resource requests - currently it looks like
100m
CPU:
Copy code
resources:
  limits:
    cpu: 100m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 1Gi
I'm assuming this is using fast register (ie.
pyflyte run --remote ...
) which requires downloading and decompressing the code at runtime. This can be a little bit more resource intensive.
This should speed up quite a bit, but I'm suspected a further look into the flytekit logs will show some more improvements.
k
thank you for the dig up
j
Seems like bumping the CPU request limit did the trick. I tried a couple different values and got the following:
Is there a way to set default limits? I'm currently having to set it per task. So something like:
Copy code
cpu_request = "1000m"
@task(
    limits=Resources(mem="1500Mi", cpu=cpu_request),
    requests=Resources(cpu=cpu_request),
    environment={"FLYTE_SDK_LOGGING_LEVEL": "10"},
)
def hello_world(name: str) -> str:
    print(f"Hello {name}")
    return name
k
ya you can update the global defaults in flyteadmin config
d
@joe so here is the flyteadmin configuration value for task defaults. An example of how this is set in the helm charts is provided here. Let us know if you run into any issues!
154 Views