Fabio Grätz01/09/2023, 4:00 PM
of both pods specify the same values for:
In case both return the same value (as ist assumably often the case), it shouldn’t matter if both write. But in case I want to return a metric which might be slightly different for each worker, is it random which one I get?
- --output-prefix - gs://.../metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0 - --raw-output-data-prefix - gs://.../xq/f6695ca08aa47490c859-n0-0 - --output-prefix - gs://...metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0 - --raw-output-data-prefix - gs://.../xq/f6695ca08aa47490c859-n0-0
Fabio Grätz01/09/2023, 4:04 PM
then? But otherwise it is random? (Just to make sure I fully understand)
if os.environ.get("RANK") != 0: raise IgnoreOutputs
Fabio Grätz01/09/2023, 5:04 PM
Fabio Grätz01/09/2023, 5:05 PM
# Control which rank returns its value
In distributed training the return values from different workers might differ. If you want to control which of the workers returns its return value to subsequent tasks in the workflow, you can raise a IgnoreOutputs exception for all other ranks.If this had been written in the pytorch plugin docs page, I would have immediately known what I need to do. We could include this sentence here, here, and here (even though it gives me some sadness that it would be replicated).