Mücahit07/18/2023, 1:22 PM
but the task that fails gets marked as Unknown, so we can't click to it and refer to the logs etc. and Flyte(1.8) keeps tracking the task
Timeout in node
Mücahit07/19/2023, 7:39 AM
to true in the flyte propeller config? More info available in the attached thread: https://discuss.flyte.org/t/2743027/hi-we-re-doing-some-performance-testing-and-when-we-start-a-#2409e7df-fda2-475e-a184-49e1be51e3ff
Mücahit07/19/2023, 9:02 AM
Mücahit07/19/2023, 9:20 AM
will make Flyte to mark the spark task as failed instead of it getting stuck at
Mücahit07/20/2023, 6:08 AM
should be enabled by default
Mücahit07/20/2023, 10:18 AM
Mücahit07/21/2023, 12:43 PM
k8s.yaml: | plugins: k8s: inject-finalizers: true
Dan Rammer (hamersaw)07/24/2023, 1:48 PM
, some for
task, but you have mentioned spark tasks - are there issues with all of these? Is the timeout not working for everything?
Mücahit07/26/2023, 10:17 AM
as you can see in the screenshot. Workflow gets marked as failed with Timeout in node error message but the task that times out
one(spark-task) is not marked as failed but stays as
Dan Rammer (hamersaw)07/27/2023, 10:38 PM
leads me to believe that the task never began executing. and the increasing timestamp in the UI does not accurately reflect what is actually happening. If Flyte reports that a node started but doesn't have a node ending this duration will continue to tick in the UI. There may be a bug where on timeout Flyte misses a report for the task ending, but there is actually nothing executing.
Lee Ning Jie Leon08/03/2023, 5:50 AM
From propellor logs, attached a screenshot. After which it triggers deletion and marked the workflow as failed.
Concern • Zero logs of what is happening being propagated to UI or why
Change in node state detected from [Running] -> [NodePhaseTimingOut], (handler phase [Timedout]) Recording NodeEvent [node_id:"n1" execution_id:<project:"..." domain:"production" name:"f4fb0fb246163cd71000" > ] phase[UNDEFINED]
is happening • Metrics doesn't show in
Timeout in node
that executions failed at all. Metrics shows all execution for this workflow to be all successful. It seem that
flag might resolve the issue above, is there a reason why this is not enabled by default? Want to understand if there are any downside to it before i switch it on.