Miha Garafolj
10/10/2022, 2:30 PMTolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Example tolerations on another pod (also flyte created).
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
<http://nvidia.com/gpu=present:NoSchedule|nvidia.com/gpu=present:NoSchedule>
<http://nvidia.com/gpu:NoSchedule|nvidia.com/gpu:NoSchedule> op=Exists
purpose=compute:NoSchedule
We already tried to restart the node pool. Is this somehow an expected behavior?
Flytekit version: 1.1.0
Dan Rammer (hamersaw)
10/10/2022, 2:31 PMMiha Garafolj
10/10/2022, 2:34 PMKetan (kumare3)
Emirhan Karagül
10/10/2022, 2:48 PMKetan (kumare3)
Dan Rammer (hamersaw)
10/10/2022, 2:56 PM