https://flyte.org logo
Title
b

Bernhard Stadlbauer

02/07/2023, 1:17 PM
I’m seeing a strange error whilst executing a fairly nested workflow:
Workflow[<worfklow-id>] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: Failed to check Catalog for previous results: DataCatalog failed to get dataset for ID resource_type:TASK name:"<task-id>" version:"<task-version>" : rpc error: code = InvalidArgument desc = missing project
This seems to come from here, but I’m not quite sure why
project
would be empty here
We’re running Flyte 1.3.0
d

Dan Rammer (hamersaw)

02/07/2023, 11:16 PM
This is very strange. There should be no situation where this lookup does not have an associated project. Do you have any more information? Ex. is it always the same task? Is it a reference task? this only happens with multiple layers of nested workflows? Is there any reproducible minimal example?
b

Bernhard Stadlbauer

02/08/2023, 6:24 AM
Good morning @Dan Rammer (hamersaw). I’ll try to produce a minimal example today (and maybe even debug propeller in case I find one). It’s slighly tricky to reproduce, as this everything Flyte has to offer - heavily subworkflow nesting, some reference tasks, dynamic task etc. Will report back once I know more
d

Dan Rammer (hamersaw)

02/08/2023, 10:18 AM
Ok, keep me posted! I'm wondering if it may be related to reference tasks, there was an issue filed regarding project declaration issues with reference launch plans. I may have time today to try and quickly repro this.
b

Bernhard Stadlbauer

02/08/2023, 10:20 AM
I found a minimum example, turns out that it has nothing to do with reference launch plans. You can find it here. Currently debugging in propeller to check where this occurs
@Dan Rammer (hamersaw) Could it be that dynamic workflows are generally broken at the moment? The following already breaks for me:
from flytekit import workflow, task, dynamic


@task
def some_task():
    print("Do something")


@dynamic
def dynamic_task():
    some_task()


@workflow
def my_workflow():
    dynamic_task()
Then triggering
my_workflow()
from the UI already fails
I think the offending PR is this one. Without this change, things work for me
d

Dan Rammer (hamersaw)

02/08/2023, 1:55 PM
OK, thank you so much for looking into this. I pinged the Union team to get as many eyes on this ASAP. We'll need to do a retrospect on this, regressions like this should not occur.
b

Bernhard Stadlbauer

02/08/2023, 2:21 PM
Thank you! For anyone needing a quick fix I’ve ported this to my fork:
pip install git+<https://github.com/bstadlbauer/flytekit.git@fix-dynamic-serialization>
In case you wanna be save against modificatoin, you can also point directly to the commit:
pip install git+<https://github.com/bstadlbauer/flytekit.git@5e0a270>
I’ve also created a corresponding issue
d

Dan Rammer (hamersaw)

02/09/2023, 1:01 PM
I think there is a brief comment on the issue, but for more clarity. The flytekit PR broke project / domain settings on dynamic tasks - this should have been caught in our e2e tests (that is being addressed). We still need to get this change in because of the way reference entities are executed across project / domains. In taking a look through propeller, I think there is a way we can correctly set project / domain on dynamic tasks through the compiler. Labeling this as a "bug" is incorrect, rather this is a backend change we need to make to support the flytekit PR. Hopefully this resolves the issue in the near-term and we can transparently get this patched up. Thanks so much for diving through the code and debugging this! All your effort is what makes this community so great!
b

Bernhard Stadlbauer

02/13/2023, 10:53 AM
Thanks @Dan Rammer (hamersaw) for the clarification here 👍