• Fernando Diaz

    Fernando Diaz

    4 months ago
    Hi! We added a timeout for one node in our workflow. This node is handled by a backend plugin (
    bigqueryjob
    ). We see in the logs that the node timeouts as expected:
    "Current execution for the node timed out; timeout configured: 3h0m0s"
    However, the
    Delete
    method of the plugin (https://github.com/flyteorg/flyteplugins/blob/master/go/tasks/plugins/webapi/bigquery/plugin.go#L226) is never called after the timeout, so the job is not cancelled. Is this expected? How can I tell propeller to abort the node after a timeout?
  • We can see in the internal docs that:
    // Delete the object in the remote service using the resource key. Flyte will call this API at least once. If the
    	// resource has already been deleted, the API should not fail.
    https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.26/go/tasks/pluginmachinery/webapi#AsyncPlugin So we would’ve expected propeller to call this method after a time out.
  • Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    4 months ago
    Hi @Fernando Diaz, I think this may be a bug. It looks like propeller handles execution timeouts as retryable failures. However, as you noted there is no attempted cleanup of the timed out resource. Do you mind filing an issue for this? I think a best-effort abort on the resource (which will call the Delete API for web plugins as you have shown) might be the best way to handle it.
  • Fernando Diaz

    Fernando Diaz

    4 months ago
  • Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    4 months ago
    Perfect, thanks so much!