Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

I haven’t used it much but ~1.5 years ago “I got an example to train with it” on k8s (which they don’t explicitly mentioned as supported in the docs). Ultimately under the hood it also just uses `torch.distributed.init_process_group()`, see <https://github.com/microsoft/DeepSpeed/blob/7832efbb09bcf56799de0ad501179d7099773cea/deepspeed/comm/torch.py#L57|here>. Back then I just created a kubeflow PytorchJob to run it which worked. Image needed `nvidia-cuda-toolkit`. To summarize, at the state of ~1.5 years ago I think it would already have been supported.