Hey friends, We use single task execution as integ...
# ask-the-community
b
Hey friends, We use single task execution as integration tests for some platform tasks we built as a team in the platform. We've been seeing "flakiness" in these tests and I looked a bit more into it today. It looks like the TaskIdentifier is used to create the single task execution, which means that there will be a conflict if 2 executions with the same task are launched https://github.com/flyteorg/flyteadmin/blob/master/pkg/manager/impl/execution_manager.go#L572 Does t hat sound right? We usually launch one task that will succeed and one that will fail, and collect the executions to validate the behaviour.
k
Cc @katrina
@Babis Kiosidis subsequent executions should just skip creating the launchplans
Can you talk more about the observed flakiness
Do you mean launches at the same time
And if so ya, we should skip alreadyrxists error 🤯
b
Yeah we trigger everything async, here is an example
Copy code
async def test_generate_uri_persisted(self) -> None:
        partition = two_weeks_ago().strftime("%Y-%m-%d")
        execution = SdkRemoteHadesTaskReference.generate_uri(
            endpoint=TestHadesTasks.ENDPOINT_DAILY,
            partition=partition,
            uri_prefix="<file://test>",
            overwrite=False,
        )

        assert execution.error is not None
        assert execution.error.code == "USER:Persisted"
        assert execution.closure.phase == TestHadesTasks.FAILED


    async def test_generate_uri_overwrite(self) -> None:
        partition = two_weeks_ago().strftime("%Y-%m-%d")
        execution = SdkRemoteHadesTaskReference.generate_uri(
            endpoint=TestHadesTasks.ENDPOINT_DAILY,
            partition=partition,
            uri_prefix="<file://test>",
            overwrite=True,
        )

        assert execution.error is None
        assert execution.closure.phase == TestHadesTasks.SUCCEEDED
        assert execution.outputs.get("uri").startswith(
            "<file://test/di.golden.path.EndContentFactXT2/{}/>".format(partition)
        )
I imagine that this could be a race condition on flyteadmin when 2 single task executions arrive for the same task?
k
Yup
b
We can solve this by combining these individual tests for now, not sure how easy it could be to deal with this on flyteadmin
@Pablo Casares Crespo and I are looking at this together
k
It should very easy to deal with on FlyteAdmin
It’s a db call, we should just skip if it exists
b
Ah makes sense
152 Views