Hi all -- I ran into an issue this morning in our ...
# flyte-support
p
Hi all -- I ran into an issue this morning in our Flyte instance where an underlying (individual) task in a
map_task
failed out with a very long error message (it hit a SQL error on a large insert), but the containing
map_task
(
Array Node
on the UI) and the overarching workflow were just stuck in a
Running
state until I manually went in and terminated the workflow. I'm guessing the cause of this issue was the size of the error message, but is there something I should be looking for in the logs to confirm this? Should Flyte be handling this sort of situation better (assuming it is the issue that I'm guessing)?
This is on
flytekit==1.13.5
a
hey Joe Could you share a snippet of the error message?
p
Hey -- I don't have the error message anymore, but I only found the underlying task error by rerunning the task locally (it was a SQL insert error, something along the lines of a duplicate PK error) -- the error never appeared on the Flyte UI, what we got on Flyte instead was:
Copy code
[0]: [3/3] currentAttempt done. Last Error: SYSTEM::resource not found, name [[project-name]-production/f139292485c6b2d39000-fhg3lf3i-0-n0-3]. reason: pods "f139292485c6b2d39000-fhg3lf3i-0-n0-3" not found
which is because the underlying task failed out but Flyte wasn't aware of it, I'm guessing