cool-lifeguard-49380
01/09/2023, 4:00 PMargs
of both pods specify the same values for:
- --output-prefix
- gs://.../metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
- --raw-output-data-prefix
- gs://.../xq/f6695ca08aa47490c859-n0-0
- --output-prefix
- gs://...metadata/propeller/sandbox-development-f6695ca08aa47490c859/n0/data/0
- --raw-output-data-prefix
- gs://.../xq/f6695ca08aa47490c859-n0-0
In case both return the same value (as ist assumably often the case), it shouldn’t matter if both write. But in case I want to return a metric which might be slightly different for each worker, is it random which one I get?freezing-airport-6809
cool-lifeguard-49380
01/09/2023, 4:04 PMif os.environ.get("RANK") != 0: raise IgnoreOutputs
then? But otherwise it is random?
(Just to make sure I fully understand)cool-lifeguard-49380
01/09/2023, 4:11 PMfreezing-airport-6809
cool-lifeguard-49380
01/09/2023, 5:04 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
cool-lifeguard-49380
01/09/2023, 5:05 PMcool-lifeguard-49380
01/09/2023, 5:05 PMcool-lifeguard-49380
01/09/2023, 5:05 PMcool-lifeguard-49380
01/09/2023, 5:17 PM# Control which rank returns its value
In distributed training the return values from different workers might differ. If you want to control which of the workers returns its return value to subsequent tasks in the workflow, you can raise a IgnoreOutputs exception for all other ranks.If this had been written in the pytorch plugin docs page, where I first looked, I would have immediately known what I need to do. We could include this sentence here, here, and here (even though it gives me some sadness that it would be replicated).
cool-lifeguard-49380
01/09/2023, 5:18 PMfreezing-airport-6809
freezing-airport-6809
cool-lifeguard-49380
01/09/2023, 5:34 PM