Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

For distributed pytorch (or tf, …) tasks, the return value of which worker is passed along to subsequent tasks? Is this random/a race condition?

When creating a Pytorch task, the `args` of both pods specify the same values for:

```    - --output-prefix
    - gs://.../metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
    - --raw-output-data-prefix
    - gs://.../xq/f6695ca08aa47490c859-n0-0



    - --output-prefix
    - gs://...metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
    - --raw-output-data-prefix
    - gs://.../xq/f6695ca08aa47490c859-n0-0```
In case both return the same value (as ist assumably often the case), it shouldn’t matter if both write. But in case I want to return a metric which might be slightly different for each worker, is it random which one I get?

No you can control this. For all worker processes you can raise an ignoreoutputs exception and let rank0 return output

I’ll just add an `if os.environ.get("RANK") != 0: raise IgnoreOutputs` then? But otherwise it is random?
(Just to make sure I fully understand)

but it should be somewhere more discoverable

I first looked here:
<https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html>

But other users might come from the tf or mpi page

So I think all of these pages should link to one place where it is documented.

&gt; # Control which rank returns its value
&gt; 
&gt; In distributed training the return values from different workers might differ. If you want to control which of the workers returns its return value to subsequent tasks in the workflow, you can raise a <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.core.base_task.IgnoreOutputs.html|IgnoreOutputs> exception for all other ranks.
If this had been written in the pytorch plugin docs page, where I first looked, I would have immediately known what I need to do. We could include this sentence <https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html|here>, <https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/kubernetes/kftensorflow/tf_mnist.html#sphx-glr-auto-integrations-kubernetes-kftensorflow-tf-mnist-py|here>, and <https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/kubernetes/kfmpi/mpi_mnist.html#sphx-glr-auto-integrations-kubernetes-kfmpi-mpi-mnist-py|here> (even though it gives me some sadness that it would be replicated).

<https://github.com/flyteorg/flytesnacks/pull/938>