Hi Community! I'm experimenting with map_task() (...
# flyte-support
m
Hi Community! I'm experimenting with map_task() (using ArrayNodeMapTask) and specifically the min_successes parameter. I am running flyte 1.13.3. I have a workflow that used a dynamic task at the end to execute/export a number of reports (jupyter notebooks). Sometimes the computation of this workflow is considerable, and I don't want the workflow to "fail" just because perhaps a report OOMs and so the task for that report fails, causing the whole workflow to fail. So I implemented the execution of reports via map_task() and initially set min_successes to 0, because this is the logic I want, but I saw in the flytekit code this is not supported (the value is not used if it is 0). So I implemented a silly hack to add a dummy task that always succeeds, so that I can then use min_successes=1. But when I run this, even though one task succeeds and one task fails, the map_task and total workflow is still reported as failing. Am I misinterpreting the intention of this feature? Is there a better/different way to indicate some task or tasks are allowed to fail, and not cause the workflow to fail? In this case, the tasks are all "leaf nodes" in the DAG, so there are no downstream dependencies and failure of these tasks does not impact anything else about the workflow. Thanks!
I note that these docs do not mention
min_successes
and instead only
min_success_ratio
- and that the text indicates that the map_task would terminate once this ratio was achieved, which is also not what I want. I want to attempt every element in the map, and it's ok if some or all of them fail.
t
will try to repro this in a bit. for now, note that there is a failure policy on the workflow which will allow running tasks to complete (but will still leave the workflow in a failed state).
if you have a small bit of code to repro, feel free to paste it here as well please?
b
Did you try setting min_success_ratio?
m
@thankful-minister-83577 @best-oil-18906 - In reviewing more code in the flytekit repository, I don't think min_success or min_success_ratio is what I'm looking for -- primarily because they each purport to end the map task as soon as the specified number of sub-tasks have been completed successfully. (e.g. if min_success is 3, it will end after 3 succeed, even if there were 20 more that have not been executed). I'm looking for a way to execute as many as possible, allowing that some (or perhaps even all) might "fail", but allow the workflow to proceed and be marked as successful. A concrete example, it case it helps in understanding why this could be useful, is numerical fitting of data, in which you resample the data e.g. 100 times, and fit these resampled sets to arrive at a distribution of fitted parameters. For some types of fitting algorithms, memory can grow depending on the "difficulty" of the fit. In this example, any number of completed fits is helpful information -- and we'd prefer the whole workflow not fail just because some of the fits were OOM-killed. We'd also not like to set the memory resources extremely high just to accommodate pathological cases. So a min_success or min_success_ratio that did not cause the map task to "early out" when the min count was met would be useful in this case.
b
The thing is that a failed map_task node returns None if min_success_ratio<1. the rest of the workflow should expect an Optional[input_type] instead of input type, and the workflow can continue. min success ratio should not early out.
m
@best-oil-18906 I see, perhaps you are right. I saw this and though this is from the legacy model for Map task, it aligns with the documentation here which says:
min_success_ratio
determines the minimum fraction of total jobs that must complete successfully before terminating the map task and marking it as successful.
b
yep
we are using map tasks like that all time
m
@best-oil-18906 Ok thanks, I'll give it a try.