Hey there! We are noting some relatively high late...
# ask-the-community
Hey there! We are noting some relatively high latency for simple task execution. We're using the flyte-binary deployment in GKE. Tasks start immediately and the pods show complete but take a couple minutes before formally completing in flyte (these are simple hello world tasks). Are there any knobs to turn to lower this execution latency. Some sort of heartbeat interval we can tighten? We've tried lowering both workflow-reeval-duration and downstream-eval-duration, but regardless a task, even running on its own, takes ~2 minutes. @Dan Rammer (hamersaw)
Hey @Taylor Stout - thanks for the tag! So first-off perf is something we're spending considerable time on ironing out here. I wrote an RFC on improving observability and made considerable progress here. I don't think it will be ready for 1.4 (end of February), but will certainly be in 1.5.
Currently, our docs have a brief optimizing performance section. That may offer some solution.
Unfortunately, perf tuning requires some level of esoteric knowledge on how propeller works - this will all be updated with the performance observability stuff, with better information about how to identify and mitigate overhead.
Lets dive into this though - can you provide some timestamps? Ex - pod start / end times, container start / end times? flyte task start / end times?
We expect some level of overhead for task executions, but this should be measureable in seconds - if it's much larger there can be issues with image pulls, k8s informers, or flyte queueing / reevaluations as you mentioned.
Hi Dan. I'm doing some more thorough digging into individual task executions. Appears the latency is coming from the Pod itself. We're using the ghcr.io/flyteorg/flytekit:py3.10-1.3.1 image and it is present on the machine. Event logs in GKE show the container pulled, scheduled, started in about 4 seconds. Start time for container was 103051 AM and I added a stdout print test msg for the task starting execution which comes through at 103215 AM. Total runtime for the task was 1m 41s. So appears most of the time is tied to some sort of "cold start" of the container image itself. There's no logs coming from the container that may clue me in to what it is doing.
So it sounds like the Flyte orchestration overhead is relatively small, ~17s which is still larger than we would expect but not the largest issue here. Is tested with fast registration? @Yee @Eduardo Apolinario (eapolinario) any ideas what would take 1m30s for flytekit to start running the task?
I believe i'm using "fast registration". Just calling pyflyte run --remote
Yeah, then you're using the fast registration process. To try and be brief here, this is designed to quickly deploy and iterate over workflow executions, however there is additional overhead. When you call
pyflyte run --remote
Flyte packages the python code and writes it to the blobstore, then when a task is executed it needs to download the code from the blobstore, decompress it, and then it can execute it. The alternative approach is to build a specific image that already contains the code. It is explained in more depth here. This way, executing a task just starts a container and runs the python fucntion (without needing to download the code from blobstore first). This is the preferred approach for productionized workflows because of the performance. I don't think it will have significant impact on the task execution time, but it may be worth trying. If the blobstore access if very slow fast registration will result in more overhead.
could you run with more logging?
set that as part of the environment variable in the @task decorator itself.
i think it’s
@task(environment={"FLYTE_SDK_LOGGING_LEVEL": "10"})
will make it a lot more verbose, should clue us in as to where the time is being spent.
Yes, now getting much more error logs. Appears to be many passes at "Setting protocol to file". Many many registering events. Here's a sampling:
Automatically registering file:// as file with <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0>
Registered <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0> as handler for <class 'pandas.core.frame.DataFrame'>, protocol gs, fmt parquet
Automatically registering file:// as file with <flytekit.types.structured.basic_dfs.ParquetToPandasDecodingHandler object at 0x7f6a5a7eb670>
are there time stamps
associated with those log lines?
the default logger should print things in json
yes. They start at 2023-02-16 194700 and last register event log entry is at 2023-02-16 194752 (52 seconds of streaming logs like the ones above). Then "Welcome to Flyte!" at 144814. Then task completed within 3 seconds.
a Snippit
thanks for the screenshot @Taylor Stout.
i haven’t looked into the cloudwatch log uploader in a while - qq for you - how accurate do you think those timestamps are?
the concerning bit is that it seems to have taken 30s from when python was loaded to when the download even started.
also, backing up a bit. you say this is on GKE? so you’ve deployed minio? (the paths are all s3:// paths, not gs:// paths)
Hi @Yee. The timestamps should be accurate. They align with the timestamps provided by the Json payload:
Yes, this is on GKE. We have Minio deployed in that environment and it is configured with the minio endpoint.
@Yee @Dan Rammer (hamersaw) I figured this out. It was due to low CPU limits set on the tasks. When configuring our cluster, we set a low default request of 0.1 cpu allocation for tasks. We tend to set requests and not limits on CPU usage so that our services can utilize the hardware in an eager manner. Appears Flyte defaults all k8s resource "limits" for both CPU and Memory to be based on "requested" resources. I was able to set the resource allocation explicitly via the task decorator.... requests=Resources(cpu="0.1"), limits=Resources(cpu="2")
It was just being throttled. Thanks for both of your times