little-cricket-84530
10/24/2023, 5:42 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
little-cricket-84530
10/24/2023, 6:53 PMlittle-cricket-84530
10/24/2023, 7:10 PMhallowed-mouse-14616
10/25/2023, 3:04 PMnode-execution-deadline
and / or node-active-deadline
in the propeller configuration. However, we updated this so that they are defaulted to 0
or unlimited. Can you check your configuration on this? Hopefully we didn't miss anything.little-cricket-84530
10/25/2023, 3:09 PMhallowed-mouse-14616
10/25/2023, 3:12 PM0
, ultimately this is the code this determines node timeouts.little-cricket-84530
10/25/2023, 4:18 PMnode-execution-deadline
or node-active-deadline
anywherehallowed-mouse-14616
10/25/2023, 4:18 PMhallowed-mouse-14616
10/25/2023, 4:20 PM0
for all of the deadlines.little-cricket-84530
10/25/2023, 4:22 PMhallowed-mouse-14616
10/25/2023, 4:23 PMlittle-cricket-84530
10/25/2023, 4:23 PMlittle-cricket-84530
10/25/2023, 4:23 PMlittle-cricket-84530
10/25/2023, 7:19 PMhallowed-mouse-14616
10/25/2023, 7:58 PMbroad-train-34581
10/26/2023, 4:50 AMinject-finalizer
a month ago.
Usually there is an underlying problem that cause the tasks to run for very long and hit into this and it happen mostly to new users onboarding and developing their code. The state goes unknown and they cant access the logging url.
We don’t have node-execution-deadline
or node-active-deadline
either. Did I configured it wrongly? The default-env-vars
are working for us.
configmap:
k8s:
plugins:
k8s:
default-env-vars:
....
inject-finalizer: true
I attached a ss that happened <24 hours ago, for this user, retries is set to 0 with 60 mins task timeout, but it still get node timeout.
@task(
retries=0,
timeout=timedelta(minutes=60))
cc: @powerful-animal-86823 @best-actor-6858hallowed-mouse-14616
10/26/2023, 12:59 PMtimeout
configuration in the task decorator will make the node timeout. Is this not expected?broad-train-34581
10/26/2023, 2:20 PMTimeout in node
, the task state goes from running
to unknown
, becomes un-clickable and unable to access logs from the UI. The execution duration continue to run indefinitely on the UI. The task timeout will end with a failed
state with logs and timeout duration stated.
Checking further, I think the user removed the timeout in the recent version or reran an old workflow 🤔 , nevertheless we already have the inject-finalizer
and don't expect the unknown state and inaccessibility to logs. From a user pov, the user has no idea what went wrong and its hard to debug. For some users, they thought the pod is running indefinitely without getting timeout.
These said, I'll ask the user to retry with a new version and see if it still happen.
[1/1] currentAttempt done. Last Error: USER::task execution timeout [1h0m0s] expired
Timeout in node
hallowed-mouse-14616
10/30/2023, 1:15 PMlittle-cricket-84530
10/30/2023, 1:48 PMfreezing-airport-6809
little-cricket-84530
10/30/2023, 2:31 PMfreezing-airport-6809
little-cricket-84530
10/30/2023, 2:34 PMbroad-train-34581
10/31/2023, 4:23 AMABORTED
but UI is showing unknown. Might just be an UI bug afterall 😅