I would also have a question with regards to spot ...
# ask-the-community
b
I would also have a question with regards to spot instances. We have a subworkflow (workflow within a workflow) which starts tasks on spot nodes (
interruptible=True
). The task then fails with
Copy code
Last Error: USER::Pod was terminated in response to imminent node shutdown.
The last log from the node where this task is run on is:
Copy code
Deleting node <node-id> because it does not exist in the cloud provider
So we assume the node is reclaimed by GCP. Does anyone happen to know whether there might be a difference in how interruptible task failures are handled in subworkflows?
k
Wdym? Do you think retries were not done?
Can you please explain
b
The workflow failed without retries, yes
k
Did you have atleast one retry on the task?
Adding interruptible will not auto retry as some tasks are not retryable
b
Oh thats good to know - we did not have that 👍
Should I edit the docs accordingly?
k
Cc @Dan Rammer (hamersaw) can you confirm you need 1 retry to set to activate interruptible retry
d
Yes, setting a task as
interruptible
just sets combinations of
NodeSelectors
,
Affinities
, and
Tolerations
depending on the configuration. It doesn't effect the number of retries available to a task.
b
Here’s the related docs PR
167 Views