Fabio Grätz
01/09/2023, 4:00 PMargs
of both pods specify the same values for:
- --output-prefix
- gs://.../metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
- --raw-output-data-prefix
- gs://.../xq/f6695ca08aa47490c859-n0-0
- --output-prefix
- gs://...metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
- --raw-output-data-prefix
- gs://.../xq/f6695ca08aa47490c859-n0-0
In case both return the same value (as ist assumably often the case), it shouldn’t matter if both write. But in case I want to return a metric which might be slightly different for each worker, is it random which one I get?Ketan (kumare3)
Fabio Grätz
01/09/2023, 4:04 PMif os.environ.get("RANK") != 0: raise IgnoreOutputs
then? But otherwise it is random?
(Just to make sure I fully understand)Ketan (kumare3)
Fabio Grätz
01/09/2023, 5:04 PMKetan (kumare3)
Fabio Grätz
01/09/2023, 5:05 PM# Control which rank returns its value
In distributed training the return values from different workers might differ. If you want to control which of the workers returns its return value to subsequent tasks in the workflow, you can raise a IgnoreOutputs exception for all other ranks.If this had been written in the pytorch plugin docs page, I would have immediately known what I need to do. We could include this sentence here, here, and here (even though it gives me some sadness that it would be replicated).
Ketan (kumare3)
Fabio Grätz
01/09/2023, 5:34 PM