How do retries and timeouts interact with each oth...
# flyte-support
g
How do retries and timeouts interact with each other? I have a task for which I've set a 5 minute timeout and 3 retries. In the most recent runs from last night, I see that due to lack of resources (it requires a gpu, but we scale our gpu pool of nodes down to 0 overnight) it times out after 10 minutes and only got retried twice. I would expect it to run at most 15 minutes (3x5min) instead of 10
d
I don't think we will retry if there's a timeout error
can you provide your code?
g
Can I just give the @task decorator? As the actual task doesn't get run, and when it does run, it normally doesn't time out.
d
srue
sure
in my experiece, if you reach timeout error, we will not retry
g
Copy code
@task(container_image=image_spec, requests=Resources(cpu="1", mem="2G"), accelerator=GPUAccelerator("nvidia-l4"), limits=Resources(cpu="4", mem="7G", gpu="1"), timeout=timedelta(minutes=5), retries=3)
Okay, is that because the timeout is a system error?
d
yes I think is
maybe you need to provide your console's screenshot and more log from propeller
g
Okay
d
so that it will be more easier to help use figure out what happened
tyty
g
The task at hand is the
download_raw
task. As you can see, it has 2 retries and ran/the pod lived (but not necessarily scheduled and run on a node) for 10 minutes. Lmk what else you'd need
d
busy on other priority
will come back if I have time today
sry
g
No worries 👍 Good luck
d
and the file is not found
I can
can't open it
g
Yeah, forgot to edit one more sensitive thing out. One sec
Here you go. It's not high priority, just curious about if this is expected behaviour and the relationship between timeouts and retries so I can tune my workflows better.
https://flyte--6382.org.readthedocs.build/en/6382/user_guide/flyte_fundamentals/optimizing_tasks.html#configuring-retries Ah, if I read correctly. It might be that somewhere in my flytepropeller config the default number of system error based retries is 1 or 2. (Is attempts == retries or attempts = 1 + retries?)
c
@gentle-tomato-480 from that line it seems like it's attempts = 1 + retries. Have you been able to test this?
g
For our needs, 2 turned out to be enough. So I haven't touched/changed it in config