Hi, @Jacob Wang - can you provide a little more context? What does the error message say? Can you use the ID to check if the job is still actually running?
10/05/2022, 2:45 PM
Hey, the error is just a aws batch job runtime error and the job goes to the failed state. So it is not running. But on the flyte console side the shows task is still running so I cannot know until I click the task detail in the flyte console
Dan Rammer (hamersaw)
10/05/2022, 2:50 PM
Interesting, it seems that Flyte is failing to report the error. Do mind filing an issue on Github? it helps to be as thorough as possible in the description (ie. exact error message, etc). We will pick this us ASAP.
cc @Kevin Su I know you've done some work on AWS batch - do you have any initial ideas?
10/05/2022, 2:57 PM
Does this task rerun again? (task will rerun 30 times by default if propeller got retryable failure). if the task is rerunning, the status will be “running”
10/06/2022, 7:56 AM
Hey, no the task didn’t rerun, ok I will fill this issue on GitHub
Sorry for the late reply since we are on different time zone
Ok it seems that the aws batch job is actually in SUCCEEDED state, but there’s exception captured by flyte and logged in cloudwatch. And also flyte console knows it is failed. So I guess this is related to flyte captured the exception so no exception raised in aws batch job container runtime. But flyte knows there’s an exception…
@Dan Rammer (hamersaw)@Kevin Su
Do you think this is a bug?
If so, would be good if you send me an guideline about filling an issue in flyte’s github repo, if there’s any
10/06/2022, 6:10 PM
Just file a bug at here. btw, mind sharing the code snippet, that will help debug.