Hi, I got the `OOMKilled` error, and I found many ...
# ask-the-community
r
Hi, I got the
OOMKilled
error, and I found many similar questions in this channel. According to the discussion, the reason is resource shortage, so I set
@task(limits=Resources(storage="20Gi", ephemeral_storage="20Gi", mem="40Gi", cpu="6"))
which is enough for the error task. But I still got the same error. I do not use GPU. Any other reasons for this?
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f52215af773e248b0af3-n2-0] terminated with exit code (1). Reason [OOMKilled]. Message: 
asctime": "2023-03-20 05:09:46,909", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
tar: Removing leading `/' from member names
{"asctime": "2023-03-20 05:09:48,309", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
...
s
Can you increase mem to an even bigger value and see if that's resolving the issue?
r
Ok, I will set 64Gi which is max size.
It does not solved.
How can I get more information about the error? Is there options such as verbose or v? It is hard to debug only with
OOMKilled
b
Is there enough memory on the worker node? You could use request instead of limit that guarantees the allocation of those resources to the process
r
I run it locally.
Yes, my machine has enough memory.
b
But you are deploying to a local k8s cluster?
r
Yes, I installed
minikube
. flyte use it?
Do you know the namespace for flyte? I do not know these are for flyte.
Copy code
ryo@ryo:~$ kubectl get po
NAME          READY   STATUS      RESTARTS   AGE
py39-cacher   0/1     Completed   0          25m
ryo@ryo:~$ kubectl get node
NAME           STATUS   ROLES                  AGE   VERSION
344a31d3f672   Ready    control-plane,master   26m   v1.24.4+k3s1
Copy code
ryo@ryo:~$ kubectl get po -n flyte
NAME                                                  READY   STATUS    RESTARTS   AGE
flyte-sandbox-docker-registry-b5c57c55-rtc4q          1/1     Running   0          29m
flyte-sandbox-kubernetes-dashboard-6757db879c-4q6cm   1/1     Running   0          29m
flyte-sandbox-proxy-d95874857-8t8h7                   1/1     Running   0          29m
flyte-sandbox-postgresql-0                            1/1     Running   0          29m
flyte-sandbox-75c5d88454-tpm9c                        1/1     Running   0          29m
flyte-sandbox-minio-645c8ddf7c-9f9zg                  1/1     Running   0          29m
Copy code
ryo@ryo:~$ kubectl get node -n flyte
NAME           STATUS   ROLES                  AGE   VERSION
344a31d3f672   Ready    control-plane,master   30m   v1.24.4+k3s1
@Björn My minikube cluster use default value... I will set larger one. Thank you very much.
Copy code
ryo@ryo:~$ kubectl top node -n flyte
W0320 20:15:45.557393  176553 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
344a31d3f672   185m         2%     1797Mi          2%
flyte uses minikube? When I tried to delete minikube cluster then I got the error.
ryo@ryo:~$ minikube delete
🙄 "minikube" profile does not exist, trying anyways.
💀 Removed all traces of the "minikube" cluster.
b
what's the error here - that minikube profile doesn't exist? seems to work regardless?
If you use the demo cluster:
flytectl demo start
flyte will set up a k8s environment for you... https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_demo.html
r
But I still can see flyte node.
Copy code
ryo@ryo:~$ kubectl top node -n flyte
W0320 20:26:20.436047  179361 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
344a31d3f672   167m         2%     1793Mi          2%
b
sorry, haven't used minikube 😞
r
No problem. I am also confused with minikube and flytectl🙂
b
🙂 maybe list the profiles with
minikube profile list
?
r
🙂 maybe list the profiles with
minikube profile list
?
yes, I already got this suggestion and tried it.
ryo@ryo:~$ minikube stop
🤷 Profile "minikube" not found. Run "minikube profile list" to view all profiles.
👉 To start a cluster, run: "minikube start"
ryo@ryo:~$ minikube profile list
🤹 Exiting due to MK_USAGE_NO_PROFILE: No minikube profile was found.
💡 Suggestion:
You can create one using 'minikube start'.
minikube is related to this issue?
k
You are using limits. Limits is like optimistic allocation- k8s will schedule and kill if not available. Use request. This is forced. So on minikube you do not have the mem, it will not get scheduled
Cc @Samhita Alla ^ - increasing limits is futile
r
s
increasing limits is futile
My bad. Haven't closely looked at what's being assigned.
@Ryo M, let us know if you're able to resolve the issue.
r
I got
Request rejected by the API, due to Invalid input.
when I set
requests
instead of
limits
.
Copy code
$ pyflyte run --remote --image <http://gcr.io/myimage:mytag|gcr.io/myimage:mytag> src/train.py training_workflow
{"asctime": "2023-03-21 08:22:47,049", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
Request rejected by the API, due to Invalid input.
        Reason: 
        Input Request: {
  "id": {
...
This is the same error message. But the reason may be different. https://flyte-org.slack.com/archives/CP2HDHKE1/p1678462238468779
I realized the new version of flytekit was released. I will try it.
The same error even after 1.4.1 -> 1.4.2 of flyte
Even small requests, I got the same error.
I tried
minikube start --cpus 6 --memory 40G --disk-size 50G --kubernetes-version v1.21.14 --driver=kvm2
but nothing changed.
s
@Ryo M, could you share your task decorator arguments?
r
For example, this one
Copy code
@task(requests=Resources(storage="20Gi",  mem="50Gi", cpu="6"))
s
Could you share the full error msg?
Are you sure you're able to run the workflow when you set
limits
instead of
requests
?
r
I am not able to run even with both
s
How have you set up your Flyte cluster?
166 Views