Hi, I got the `OOMKilled` error, and I found many ...
Hi, I got the
error, and I found many similar questions in this channel. According to the discussion, the reason is resource shortage, so I set
@task(limits=Resources(storage="20Gi", ephemeral_storage="20Gi", mem="40Gi", cpu="6"))
which is enough for the error task. But I still got the same error. I do not use GPU. Any other reasons for this?
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f52215af773e248b0af3-n2-0] terminated with exit code (1). Reason [OOMKilled]. Message: 
asctime": "2023-03-20 05:09:46,909", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
tar: Removing leading `/' from member names
{"asctime": "2023-03-20 05:09:48,309", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
Can you increase mem to an even bigger value and see if that's resolving the issue?
Ok, I will set 64Gi which is max size.
It does not solved.
How can I get more information about the error? Is there options such as verbose or v? It is hard to debug only with
Is there enough memory on the worker node? You could use request instead of limit that guarantees the allocation of those resources to the process
I run it locally.
Yes, my machine has enough memory.
But you are deploying to a local k8s cluster?
Yes, I installed
. flyte use it?
Do you know the namespace for flyte? I do not know these are for flyte.
ryo@ryo:~$ kubectl get po
py39-cacher   0/1     Completed   0          25m
ryo@ryo:~$ kubectl get node
NAME           STATUS   ROLES                  AGE   VERSION
344a31d3f672   Ready    control-plane,master   26m   v1.24.4+k3s1
ryo@ryo:~$ kubectl get po -n flyte
NAME                                                  READY   STATUS    RESTARTS   AGE
flyte-sandbox-docker-registry-b5c57c55-rtc4q          1/1     Running   0          29m
flyte-sandbox-kubernetes-dashboard-6757db879c-4q6cm   1/1     Running   0          29m
flyte-sandbox-proxy-d95874857-8t8h7                   1/1     Running   0          29m
flyte-sandbox-postgresql-0                            1/1     Running   0          29m
flyte-sandbox-75c5d88454-tpm9c                        1/1     Running   0          29m
flyte-sandbox-minio-645c8ddf7c-9f9zg                  1/1     Running   0          29m
ryo@ryo:~$ kubectl get node -n flyte
NAME           STATUS   ROLES                  AGE   VERSION
344a31d3f672   Ready    control-plane,master   30m   v1.24.4+k3s1
@Björn My minikube cluster use default value... I will set larger one. Thank you very much.
ryo@ryo:~$ kubectl top node -n flyte
W0320 20:15:45.557393  176553 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
344a31d3f672   185m         2%     1797Mi          2%
flyte uses minikube? When I tried to delete minikube cluster then I got the error.
ryo@ryo:~$ minikube delete
🙄 "minikube" profile does not exist, trying anyways.
💀 Removed all traces of the "minikube" cluster.
what's the error here - that minikube profile doesn't exist? seems to work regardless?
If you use the demo cluster:
flytectl demo start
flyte will set up a k8s environment for you... https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_demo.html
But I still can see flyte node.
ryo@ryo:~$ kubectl top node -n flyte
W0320 20:26:20.436047  179361 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
344a31d3f672   167m         2%     1793Mi          2%
sorry, haven't used minikube 😞
No problem. I am also confused with minikube and flytectl🙂
maybe list the profiles with
minikube profile list
yes, I already got this suggestion and tried it.
ryo@ryo:~$ minikube stop
Profile "minikube" not found. Run "minikube profile list" to view all profiles.
To start a cluster, run: "minikube start"
👉 To start a cluster, run: "minikube start"
ryo@ryo:~$ minikube profile list
Exiting due to MK_USAGE_NO_PROFILE: No minikube profile was found.
Suggestion:
You can create one using 'minikube start'.
minikube is related to this issue?
You are using limits. Limits is like optimistic allocation- k8s will schedule and kill if not available. Use request. This is forced. So on minikube you do not have the mem, it will not get scheduled
Cc @Samhita Alla ^ - increasing limits is futile
increasing limits is futile
My bad. Haven't closely looked at what's being assigned.
@Ryo M, let us know if you're able to resolve the issue.
I got
Request rejected by the API, due to Invalid input.
when I set
instead of
$ pyflyte run --remote --image <http://gcr.io/myimage:mytag|gcr.io/myimage:mytag> src/train.py training_workflow
{"asctime": "2023-03-21 08:22:47,049", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
Request rejected by the API, due to Invalid input.
        Input Request: {
  "id": {
This is the same error message. But the reason may be different. https://flyte-org.slack.com/archives/CP2HDHKE1/p1678462238468779
I realized the new version of flytekit was released. I will try it.
The same error even after 1.4.1 -> 1.4.2 of flyte
Even small requests, I got the same error.
I tried
minikube start --cpus 6 --memory 40G --disk-size 50G --kubernetes-version v1.21.14 --driver=kvm2
but nothing changed.
@Ryo M, could you share your task decorator arguments?
For example, this one
@task(requests=Resources(storage="20Gi",  mem="50Gi", cpu="6"))
Could you share the full error msg?
Are you sure you're able to run the workflow when you set
instead of
I am not able to run even with both
How have you set up your Flyte cluster?