https://flyte.org logo
#ask-the-community
Title
# ask-the-community
s

Sujith Samuel

11/13/2023, 9:46 AM
#ask-the-community Does Flyte timeout the tasks after 24 hours, I have a task defined in Flyte which timesout after 24 h< Please see screen shot below Kindly let me know how to increase this timeout value.
d

Dan Rammer (hamersaw)

11/13/2023, 1:53 PM
This value should be in the propeller configuration. We have updated it to
0
, which indicates unlimited by default, but this must be an older deployment. It should be located at:
Copy code
node-config:
        default-deadlines:
            node-execution-deadline: 0h
s

Sujith Samuel

11/13/2023, 2:04 PM
the values for node-config in my case node-config: default-deadlines: node-active-deadline: 168h node-execution-deadline: 168h workflow-active-deadline: 168h One thing is that I dont have workflow, only task, so when the user comes in, he doesnt trigger a workflow, only triggers a task, Will this behaviour cause any difference in which the execution-deadline would work???
d

Dan Rammer (hamersaw)

11/13/2023, 2:39 PM
It shouldn't matter that the user triggers a task, underneath flyteadmin actually wraps this in a workflow to execute. You can hit the
<flytepropeller-ip>:10254/config
endpoint with a simple curl and it will dump the actual parsed config in json. I'm wondering if something is not being set at the correct level maybe?
s

Sujith Samuel

11/13/2023, 4:36 PM
node-config":{"default-deadlines":{"node-active-deadline":"168h0m0s","node-execution-deadline":"168h0m0s","workflow-active-deadline":"168h0m0s"} This is what comes when i dump with the above command
Which fits in nicely with the declared ones in the configmap
d

Dan Rammer (hamersaw)

11/13/2023, 4:39 PM
have you restarted propeller since the config was updated? this option does not hotswap. can you try restarting and checking again?
Also, just want to make sure there is no timeout explicitly defined on the task.
s

Sujith Samuel

11/20/2023, 8:14 AM
Hello Team, We restared the flyte propeller just in case the configmap was not applied. Also there is no timeout defined in task, But the task still aborts after 24 hours
2023-11-18T055404.867156891Z {"json":{"exec_id":"afmgst9tltzgdhtx9tbg","node":"nmlpntasutgeneratortaskentrytrainingtask","ns":"felixd-nmlp-ntas-ut-generator-development","res_ver":"438992659","routine":"worker-2","src":"task_event_recorder.go:27","wf":"felixd-nmlp-ntas-ut-generatordevelopment.flytegen.nmlp_ntas_ut_generator.task_entry.training_task"},"level":"warning","msg":"Failed to record taskEvent, error [EventAlreadyInTerminalStateError: conflicting events; destination: ABORTED, caused by [rpc error: code = FailedPrecondition desc = invalid phase change from FAILED to ABORTED for task execution {resource_type:TASK project:\"felixd-nmlp-ntas-ut-generator\" domain:\"development\" name:\"nmlp_ntas_ut_generator.task_entry.training_task\" version:\"20231116-065732\" node_id:\"fiujrgwa\" execution_id\u003cproject\"felixd-nmlp-ntas-ut-generator\" domain:\"development\" name:\"afmgst9tltzgdhtx9tbg\" \u003e 1 {} [] 0}]]. Trying to record state: ABORTED. Ignoring this error!","ts":"2023-11-18T055404Z"} THis is the error I get in the propeller logs.. Is there anything more I can do to get detailed information on this
Nmlpntasutgeneratortaskentrytrainingtask: Active Deadline: 48h0m0s Execution Deadline: 24h0m0s This is what is says when i describe the workflow, why would it say this when i have set 168h as the workflow active deadline
My node config says "node-config":{"default-deadlines":{"node-active-deadline":"168h0m0s","node-execution-deadline":"168h0m0s","workflow-active-deadline":"168h0m0s"} But when i describe my worklfow why does it say Nmlpntasutgeneratortaskentrytrainingtask: Active Deadline: 48h0m0s Execution Deadline: 24h0m0s Id: nmlpntasutgeneratortaskentrytrainingtask
Is this because this was started just as a task and not a workflow. I dont see this field populated for other tasks which are started under workflows
#ask-the-community please help
@Samhita Alla Kindly help
s

Samhita Alla

11/21/2023, 5:02 AM
@Sujith Samuel do you see 48h or 24h being defined anywhere in your config? also, have you tried triggering a workflow instead of a task just to check if the values you set are being considered?
s

Sujith Samuel

11/21/2023, 5:26 AM
Yes, Now I am trying 2 things, 1. Encapsulate this task into a workfllow and then triggering the workflow 2. I will run the task also in parallel with timeout specified to 72h
Hopefully things work out....
I have not defined 48h or 24h anywhere in my config by the way
@Samhita Alla, it did not work We added this for task @task( cache_version=get_hash_for_file(Path(file)), cache=1.0, timeout=timedelta(days=7), limits=Resources(cpu="5000m", mem="80000Mi"), environment={"aaa":"bbb"} ) But the task aborted after 24 hours, Can you please let me know if the above is correct or is there another way to declare timeout.... With or without timeout specifed, it is aborting the task automatically after 24 hours
s

Samhita Alla

11/25/2023, 1:17 PM
oh, have you tried triggering a workflow instead of a task?
s

Sujith Samuel

11/25/2023, 3:26 PM
@Samhita Alla < yes we triggered a workflow which executes the same task and it seems that it doesnt get cancelled, its still running
So is it to be assumed that tasks will be timed out after 24 hours if not encapsulated within workflows
j

jeev

11/25/2023, 6:04 PM
hmm. there was an issue for this I believe. let me dig it up.
what version of Flyte are you running @Sujith Samuel
this should be fixed already.
s

Sujith Samuel

11/26/2023, 9:23 AM
My flyte propeller version is flytepropeller:v1.1.15
m

Mateusz Kwasniak

11/26/2023, 1:30 PM
Hey @Sujith Samuel, sorry for plugging into this discussion with a different question but how do you enable these Kubernetes Logs? In mine deployment it says "No logs found" for all of the tasks and I think Kubernetes logs are enabled in configmap by default.
j

jeev

11/27/2023, 7:38 AM
@Sujith Samuel: are you sure your propeller version is 1.1.15? that's from july 2022. that definitely does not have the above fix.
would explain why your standalone tasks are timing out after 24h