Rupsha Chaudhuri
10/24/2023, 5:42 PMKetan (kumare3)
Rupsha Chaudhuri
10/24/2023, 6:53 PMDan Rammer (hamersaw)
10/25/2023, 3:04 PMnode-execution-deadline
and / or node-active-deadline
in the propeller configuration. However, we updated this so that they are defaulted to 0
or unlimited. Can you check your configuration on this? Hopefully we didn't miss anything.Rupsha Chaudhuri
10/25/2023, 3:09 PMDan Rammer (hamersaw)
10/25/2023, 3:12 PM0
, ultimately this is the code this determines node timeouts.Rupsha Chaudhuri
10/25/2023, 4:18 PMnode-execution-deadline
or node-active-deadline
anywhereDan Rammer (hamersaw)
10/25/2023, 4:18 PM0
for all of the deadlines.Rupsha Chaudhuri
10/25/2023, 4:22 PMDan Rammer (hamersaw)
10/25/2023, 4:23 PMRupsha Chaudhuri
10/25/2023, 4:23 PMDan Rammer (hamersaw)
10/25/2023, 7:58 PMLee Ning Jie Leon
10/26/2023, 4:50 AMinject-finalizer
a month ago.
Usually there is an underlying problem that cause the tasks to run for very long and hit into this and it happen mostly to new users onboarding and developing their code. The state goes unknown and they cant access the logging url.
We don’t have node-execution-deadline
or node-active-deadline
either. Did I configured it wrongly? The default-env-vars
are working for us.
configmap:
k8s:
plugins:
k8s:
default-env-vars:
....
inject-finalizer: true
I attached a ss that happened <24 hours ago, for this user, retries is set to 0 with 60 mins task timeout, but it still get node timeout.
@task(
retries=0,
timeout=timedelta(minutes=60))
cc: @Zi Yi Ewe @Krithika SundararajanDan Rammer (hamersaw)
10/26/2023, 12:59 PMtimeout
configuration in the task decorator will make the node timeout. Is this not expected?Lee Ning Jie Leon
10/26/2023, 2:20 PMTimeout in node
, the task state goes from running
to unknown
, becomes un-clickable and unable to access logs from the UI. The execution duration continue to run indefinitely on the UI. The task timeout will end with a failed
state with logs and timeout duration stated.
Checking further, I think the user removed the timeout in the recent version or reran an old workflow 🤔 , nevertheless we already have the inject-finalizer
and don't expect the unknown state and inaccessibility to logs. From a user pov, the user has no idea what went wrong and its hard to debug. For some users, they thought the pod is running indefinitely without getting timeout.
These said, I'll ask the user to retry with a new version and see if it still happen.
[1/1] currentAttempt done. Last Error: USER::task execution timeout [1h0m0s] expired
Timeout in node
Dan Rammer (hamersaw)
10/30/2023, 1:15 PMRupsha Chaudhuri
10/30/2023, 1:48 PMKetan (kumare3)
Rupsha Chaudhuri
10/30/2023, 2:31 PMKetan (kumare3)
Rupsha Chaudhuri
10/30/2023, 2:34 PMLee Ning Jie Leon
10/31/2023, 4:23 AMABORTED
but UI is showing unknown. Might just be an UI bug afterall 😅