Jonathan Lamiel
06/05/2022, 2:42 PM[1/1] currentAttempt done. Last Error: USER::task execution timeout [5m0s] expired
error. The reason is that the pod was trying to mount a secret volume that didn’t exist (there was a typo in the secret name)
• The problem is that the deployment on K8s of the those tasks was still available after those 5 minutes, and that for hours, keeping the 3Go for themselves. Making the other tasks waiting.
• Ultimately, after that deployments was “removed” the other tasks were picked up
I would expect Flyte to terminate the deployments right after the first error no? Freeing the ressource usage for other task?
Any clue?Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Jonathan Lamiel
06/05/2022, 3:06 PMUnable to attach or mount volumes: unmounted volumes=[onxg542gnrqwwzk6], unattached volumes=[kube-api-access-8cmfw onxg542gnrqwwzk6 aws-iam-token]: timed out waiting for the condition
MountVolume.SetUp failed for volume "onxg542gnrqwwzk6" : references non-existent secret key: password
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Jonathan Lamiel
06/05/2022, 3:11 PMKetan (kumare3)
Ketan (kumare3)
Smriti Satyan
06/06/2022, 4:45 AMDan Rammer (hamersaw)
06/06/2022, 8:23 AMdelete-resource-on-finalize
configuration works, but I think this would be nice if that option worked orthogonally. You shouldn't need to update configuration to fix this right?Jonathan Lamiel
06/06/2022, 11:32 AM