Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Hi, I am having some issues with the pytorch plugin. I have been able to start the example workflow in my test cluster where I am running Flyte version 1.5. I have the kubeflow operator version 1.7 installed there.  But in my main cluster I  am on Flyte 1.10.7 , and I keep having this issue where the PyTorchJob CRD that Flyte creates has the workers replicas set to 0, when my task config specifies 2 workers. This results in a create-delete infinite loop once the workflow is started. Is there anyone who knows what this might be about? Thanks

Hi, I opened this Issue with the details 
<https://github.com/flyteorg/flyte/issues/5417|https://github.com/flyteorg/flyte/issues/5417>

I've provided an example here
<https://github.com/flyteorg/flyte/issues/5417#issuecomment-2130586170>

Can you help me try the latest flyte version?

Thanks for replying, yes I will try to upgrade Flyte and see if that solves the issues