Hi! We added a timeout for one node in our workflo...
# announcements
f
Hi! We added a timeout for one node in our workflow. This node is handled by a backend plugin (
bigqueryjob
). We see in the logs that the node timeouts as expected:
Copy code
"Current execution for the node timed out; timeout configured: 3h0m0s"
However, the
Delete
method of the plugin (https://github.com/flyteorg/flyteplugins/blob/master/go/tasks/plugins/webapi/bigquery/plugin.go#L226) is never called after the timeout, so the job is not cancelled. Is this expected? How can I tell propeller to abort the node after a timeout?
We can see in the internal docs that:
Copy code
// Delete the object in the remote service using the resource key. Flyte will call this API at least once. If the
	// resource has already been deleted, the API should not fail.
https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.26/go/tasks/pluginmachinery/webapi#AsyncPlugin So we would’ve expected propeller to call this method after a time out.
d
Hi @Fernando Diaz, I think this may be a bug. It looks like propeller handles execution timeouts as retryable failures. However, as you noted there is no attempted cleanup of the timed out resource. Do you mind filing an issue for this? I think a best-effort abort on the resource (which will call the Delete API for web plugins as you have shown) might be the best way to handle it.
f
d
Perfect, thanks so much!
186 Views