Hey community this may be a k8s issue and not related to Fly Flyte #flyte-support

Hey community - this may be a k8s issue and not re...

microscopic-furniture-57275

02/16/2024, 4:01 PM

Hey community - this may be a k8s issue and not related to Flyte, but I'm at a loss at the moment and thought someone in the community may have seen this before... I'm seeing Flyte tasks/pods "stalling" with tiny cpu consumption, even though the cpu request is high (30), and there is plenty of availability on the node (according to my views via k9s and DataDog). This behavior is observed when several of these tasks are scheduled at the same time, on the same node, via simultaneous workflows operating on different data. E.g. 3 such tasks (30 cores each) get scheduled onto a single node with 96 or more cores. One or two of the tasks will show it is using the 30 cores it requested, and complete in the expected time. The other task (or sometimes it is two of them) never start using CPU, as if they are being throttled severely. But all metrics show that the node has CPU to spare (sometimes a little, sometimes a lot). If such a task is run by itself, it always works, using 30 cores. It is when several are scheduled together that I see this behavior, as if the sudden requests of 30 cpus by 3 pods causes the system to hold off on allocating one or two of them, and then the allocation never occurs... ? The tasks in question are executing a classifier on data in parallel batches via a ProcessPoolExecutor from concurrent.futures (standard python stuff). It's tempting to assume there is a problem in our parallel implementation, except that it always works by itself, and how could pods be interacting in this way? But it could very well be a situation in which a task tries to get some CPU, is denied, and never "asks" again (I'm not clear on how the negotiation for CPU proceeds). Presumably the ProcessPoolExecutor is trying to execute in 30 processes, and whatever CPUs are allocated will be used to schedule these processes -- but the CPU never shows up. Thanks for any experience/pointers you may have!

calm-zoo-68637

02/17/2024, 5:05 AM

Are you writing a lot of files to disk? We saw an issue recently where high disk I/O was being buffered to RAM which was causing some pods to be OOMKilled. If the pods are not even being scheduled though, you should be able to kubectl describe your pods to see why they are not getting scheduled. I notice you don’t mention how much RAM/storage you are requesting. Perhaps those are insufficient?

microscopic-furniture-57275

02/19/2024, 3:40 PM

Hey @calm-zoo-68637, thanks for your reply. The pods are being scheduled. The issue is that though they are scheduled, and describing the pods shows the cpu request (30) and ram request (150G) are as expected, some pods appear to not actually get that CPU. The pod is scheduled, as are one or two other pods with the same requests/limits. The problem is that at least one of the pods never gets handed the CPU that was requested -- per our DataDog integration, which seems to be reliable for metrics like this, we see one or two pods processing at full speed with the requested 30 cpus -- but the other one or two show less than 1core utilization. My intuition is that it is something about the ProcessPoolExecutor -- though we are telling it that it has 30 cpus to work with, the metrics for the pods that are getting CPU appear to fluctuate above and below 30 cpus -- so maybe it's the case that our implementation of the parallel execution via ProcessPoolExecutor does not actually really limit it to 30 cores -- that it goes above this (I'm not sure how sampling artifacts could show higher than what is actually being used in DataDog, but perhaps this is possible). In this case, perhaps the cpu scheduler does not think it has enough capacity to actually schedule the pod that is getting "left out", though from the node's perspective, all of the requests/limits fit into the capacity of the node.

8 Views

Open in Slack

Previous Next