Jonathan Lamiel
06/05/2022, 2:42 PM[1/1] currentAttempt done. Last Error: USER::task execution timeout [5m0s] expired
error. The reason is that the pod was trying to mount a secret volume that didn’t exist (there was a typo in the secret name)
• The problem is that the deployment on K8s of the those tasks was still available after those 5 minutes, and that for hours, keeping the 3Go for themselves. Making the other tasks waiting.
• Ultimately, after that deployments was “removed” the other tasks were picked up
I would expect Flyte to terminate the deployments right after the first error no? Freeing the ressource usage for other task?
Any clue?Ketan (kumare3)
Jonathan Lamiel
06/05/2022, 3:06 PMUnable to attach or mount volumes: unmounted volumes=[onxg542gnrqwwzk6], unattached volumes=[kube-api-access-8cmfw onxg542gnrqwwzk6 aws-iam-token]: timed out waiting for the condition
MountVolume.SetUp failed for volume "onxg542gnrqwwzk6" : references non-existent secret key: password
Ketan (kumare3)
Jonathan Lamiel
06/05/2022, 3:11 PMKetan (kumare3)
Smriti Satyan
06/06/2022, 4:45 AMDan Rammer (hamersaw)
06/06/2022, 8:23 AMdelete-resource-on-finalize
configuration works, but I think this would be nice if that option worked orthogonally. You shouldn't need to update configuration to fix this right?Jonathan Lamiel
06/06/2022, 11:32 AM