<#CP2HDHKE1|ask-the-community> Does Flyte timeout ...
# ask-the-community
s
#ask-the-community Does Flyte timeout the tasks after 24 hours, I have a task defined in Flyte which timesout after 24 h< Please see screen shot below Kindly let me know how to increase this timeout value.
d
This value should be in the propeller configuration. We have updated it to
0
, which indicates unlimited by default, but this must be an older deployment. It should be located at:
Copy code
node-config:
        default-deadlines:
            node-execution-deadline: 0h
s
the values for node-config in my case node-config: default-deadlines: node-active-deadline: 168h node-execution-deadline: 168h workflow-active-deadline: 168h One thing is that I dont have workflow, only task, so when the user comes in, he doesnt trigger a workflow, only triggers a task, Will this behaviour cause any difference in which the execution-deadline would work???
d
It shouldn't matter that the user triggers a task, underneath flyteadmin actually wraps this in a workflow to execute. You can hit the
<flytepropeller-ip>:10254/config
endpoint with a simple curl and it will dump the actual parsed config in json. I'm wondering if something is not being set at the correct level maybe?
s
node-config":{"default-deadlines":{"node-active-deadline":"168h0m0s","node-execution-deadline":"168h0m0s","workflow-active-deadline":"168h0m0s"} This is what comes when i dump with the above command
Which fits in nicely with the declared ones in the configmap
d
have you restarted propeller since the config was updated? this option does not hotswap. can you try restarting and checking again?
Also, just want to make sure there is no timeout explicitly defined on the task.
s
Hello Team, We restared the flyte propeller just in case the configmap was not applied. Also there is no timeout defined in task, But the task still aborts after 24 hours
2023-11-18T055404.867156891Z {"json":{"exec_id":"afmgst9tltzgdhtx9tbg","node":"nmlpntasutgeneratortaskentrytrainingtask","ns":"felixd-nmlp-ntas-ut-generator-development","res_ver":"438992659","routine":"worker-2","src":"task_event_recorder.go:27","wf":"felixd-nmlp-ntas-ut-generatordevelopment.flytegen.nmlp_ntas_ut_generator.task_entry.training_task"},"level":"warning","msg":"Failed to record taskEvent, error [EventAlreadyInTerminalStateError: conflicting events; destination: ABORTED, caused by [rpc error: code = FailedPrecondition desc = invalid phase change from FAILED to ABORTED for task execution {resource_type:TASK project:\"felixd-nmlp-ntas-ut-generator\" domain:\"development\" name:\"nmlp_ntas_ut_generator.task_entry.training_task\" version:\"20231116-065732\" node_id:\"fiujrgwa\" execution_id\u003cproject\"felixd-nmlp-ntas-ut-generator\" domain:\"development\" name:\"afmgst9tltzgdhtx9tbg\" \u003e 1 {} [] 0}]]. Trying to record state: ABORTED. Ignoring this error!","ts":"2023-11-18T055404Z"} THis is the error I get in the propeller logs.. Is there anything more I can do to get detailed information on this
Nmlpntasutgeneratortaskentrytrainingtask: Active Deadline: 48h0m0s Execution Deadline: 24h0m0s This is what is says when i describe the workflow, why would it say this when i have set 168h as the workflow active deadline
My node config says "node-config":{"default-deadlines":{"node-active-deadline":"168h0m0s","node-execution-deadline":"168h0m0s","workflow-active-deadline":"168h0m0s"} But when i describe my worklfow why does it say Nmlpntasutgeneratortaskentrytrainingtask: Active Deadline: 48h0m0s Execution Deadline: 24h0m0s Id: nmlpntasutgeneratortaskentrytrainingtask
Is this because this was started just as a task and not a workflow. I dont see this field populated for other tasks which are started under workflows
#ask-the-community please help
@Samhita Alla Kindly help
s
@Sujith Samuel do you see 48h or 24h being defined anywhere in your config? also, have you tried triggering a workflow instead of a task just to check if the values you set are being considered?
s
Yes, Now I am trying 2 things, 1. Encapsulate this task into a workfllow and then triggering the workflow 2. I will run the task also in parallel with timeout specified to 72h
Hopefully things work out....
I have not defined 48h or 24h anywhere in my config by the way
@Samhita Alla, it did not work We added this for task @task( cache_version=get_hash_for_file(Path(file)), cache=1.0, timeout=timedelta(days=7), limits=Resources(cpu="5000m", mem="80000Mi"), environment={"aaa":"bbb"} ) But the task aborted after 24 hours, Can you please let me know if the above is correct or is there another way to declare timeout.... With or without timeout specifed, it is aborting the task automatically after 24 hours
s
oh, have you tried triggering a workflow instead of a task?
s
@Samhita Alla < yes we triggered a workflow which executes the same task and it seems that it doesnt get cancelled, its still running
So is it to be assumed that tasks will be timed out after 24 hours if not encapsulated within workflows
j
hmm. there was an issue for this I believe. let me dig it up.
what version of Flyte are you running @Sujith Samuel
this should be fixed already.
s
My flyte propeller version is flytepropeller:v1.1.15
m
Hey @Sujith Samuel, sorry for plugging into this discussion with a different question but how do you enable these Kubernetes Logs? In mine deployment it says "No logs found" for all of the tasks and I think Kubernetes logs are enabled in configmap by default.
j
@Sujith Samuel: are you sure your propeller version is 1.1.15? that's from july 2022. that definitely does not have the above fix.
would explain why your standalone tasks are timing out after 24h