Does the <task timeout> include any time the task ...
# flyte-support
r
Does the task timeout include any time the task might be queued by k8s due to there not being enough CPUs / GPUs available? Or does the clock start ticking only when Flyte gets to the "Execute user level code." stage in the container?
f
It only started ticking when it starts the container
🙏 1
c
I believe it starts ticking when the task is marked as “running”. For task plugins this can be a little different. I think it also covers the duration including retries which can be confusing.
r
ok i hope it's when the container starts. because usually the task shows "running" in the webui even if it's queued / waiting on k8s and not actually running the container yet
c
Its when the node is considering "running" in the propeller state machine. Its the execution deadline: https://github.com/flyteorg/flyte/blob/5f2646126bbb0f3e67869b82d63c3c8785e8732c/flytepropeller/pkg/controller/nodes/executor.go#L842-L873
It seems like what you want is the pod spec activeDeadlineSeconds to be set to the task timeout but that doesn't happen. We do this in our fork tho, but the propeller execution timeout would likely timeout before the pod spec activeDeadlineSeconds anyway, since propeller will always have an earlier timestamp than the pod start I'd imagine.
r
@clean-glass-36808 thank you for the code cite!! I guess there are arguments for both: • I have N tasks and I want them complete by a deadline, no matter cluster state, and so I want a failure if we miss this deadline. • I have N tasks / python function invocations and I think each will take T seconds to run on one CPU once they start. and I just want failures if the python function itself takes longer than T seconds maybe there's an argument for the Flyte Task's API to distinguish between a "deadline" and a "timeout." Today the docs say timeout helps protect against "system-level issues" but that confounds cluster-is-overloaded with task-itself-is-slow.
c
@gentle-tomato-480
g
Thanks for the tag @clean-glass-36808 ! So just like you mentioned in the other thread, a way to set a timeout in the way that I'd expect it (i.e. that a pod can run only for x minutes) would be to use the pod template with active deadline seconds, right?
c
Yes
🙏 1