I’m seeing a strange error whilst executing a fair...
# ask-the-community
b
I’m seeing a strange error whilst executing a fairly nested workflow:
Copy code
Workflow[<worfklow-id>] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: Failed to check Catalog for previous results: DataCatalog failed to get dataset for ID resource_type:TASK name:"<task-id>" version:"<task-version>" : rpc error: code = InvalidArgument desc = missing project
This seems to come from here, but I’m not quite sure why
project
would be empty here
We’re running Flyte 1.3.0
d
This is very strange. There should be no situation where this lookup does not have an associated project. Do you have any more information? Ex. is it always the same task? Is it a reference task? this only happens with multiple layers of nested workflows? Is there any reproducible minimal example?
b
Good morning @Dan Rammer (hamersaw). I’ll try to produce a minimal example today (and maybe even debug propeller in case I find one). It’s slighly tricky to reproduce, as this everything Flyte has to offer - heavily subworkflow nesting, some reference tasks, dynamic task etc. Will report back once I know more
d
Ok, keep me posted! I'm wondering if it may be related to reference tasks, there was an issue filed regarding project declaration issues with reference launch plans. I may have time today to try and quickly repro this.
b
I found a minimum example, turns out that it has nothing to do with reference launch plans. You can find it here. Currently debugging in propeller to check where this occurs
@Dan Rammer (hamersaw) Could it be that dynamic workflows are generally broken at the moment? The following already breaks for me:
Copy code
from flytekit import workflow, task, dynamic


@task
def some_task():
    print("Do something")


@dynamic
def dynamic_task():
    some_task()


@workflow
def my_workflow():
    dynamic_task()
Then triggering
my_workflow()
from the UI already fails
I think the offending PR is this one. Without this change, things work for me
d
OK, thank you so much for looking into this. I pinged the Union team to get as many eyes on this ASAP. We'll need to do a retrospect on this, regressions like this should not occur.
b
Thank you! For anyone needing a quick fix I’ve ported this to my fork:
Copy code
pip install git+<https://github.com/bstadlbauer/flytekit.git@fix-dynamic-serialization>
In case you wanna be save against modificatoin, you can also point directly to the commit:
Copy code
pip install git+<https://github.com/bstadlbauer/flytekit.git@5e0a270>
I’ve also created a corresponding issue
d
I think there is a brief comment on the issue, but for more clarity. The flytekit PR broke project / domain settings on dynamic tasks - this should have been caught in our e2e tests (that is being addressed). We still need to get this change in because of the way reference entities are executed across project / domains. In taking a look through propeller, I think there is a way we can correctly set project / domain on dynamic tasks through the compiler. Labeling this as a "bug" is incorrect, rather this is a backend change we need to make to support the flytekit PR. Hopefully this resolves the issue in the near-term and we can transparently get this patched up. Thanks so much for diving through the code and debugging this! All your effort is what makes this community so great!
b
Thanks @Dan Rammer (hamersaw) for the clarification here 👍
210 Views