Hi all I ran into an issue this morning in our Flyte instanc Flyte #flyte-support

Hi all -- I ran into an issue this morning in our ...

proud-glass-36655

12/03/2024, 6:37 PM

Hi all -- I ran into an issue this morning in our Flyte instance where an underlying (individual) task in a

map_task

failed out with a very long error message (it hit a SQL error on a large insert), but the containing

map_task

(

Array Node

on the UI) and the overarching workflow were just stuck in a

Running

state until I manually went in and terminated the workflow. I'm guessing the cause of this issue was the size of the error message, but is there something I should be looking for in the logs to confirm this? Should Flyte be handling this sort of situation better (assuming it is the issue that I'm guessing)?

proud-glass-36655

12/03/2024, 6:38 PM

This is on

flytekit==1.13.5

average-finland-92144

12/04/2024, 10:21 AM

hey Joe Could you share a snippet of the error message?

proud-glass-36655

12/04/2024, 2:45 PM

Hey -- I don't have the error message anymore, but I only found the underlying task error by rerunning the task locally (it was a SQL insert error, something along the lines of a duplicate PK error) -- the error never appeared on the Flyte UI, what we got on Flyte instead was:

Copy code

[0]: [3/3] currentAttempt done. Last Error: SYSTEM::resource not found, name [[project-name]-production/f139292485c6b2d39000-fhg3lf3i-0-n0-3]. reason: pods "f139292485c6b2d39000-fhg3lf3i-0-n0-3" not found

which is because the underlying task failed out but Flyte wasn't aware of it, I'm guessing

6 Views

Open in Slack

Previous Next