Hi, I am having some issues with the pytorch plugi...
# ask-the-community
p
Hi, I am having some issues with the pytorch plugin. I have been able to start the example workflow in my test cluster where I am running Flyte version 1.5. I have the kubeflow operator version 1.7 installed there. But in my main cluster I am on Flyte 1.10.7 , and I keep having this issue where the PyTorchJob CRD that Flyte creates has the workers replicas set to 0, when my task config specifies 2 workers. This results in a create-delete infinite loop once the workflow is started. Is there anyone who knows what this might be about? Thanks
k
can you please provide a repro case
p
Hi, I opened this Issue with the details https://github.com/flyteorg/flyte/issues/5417
l
Can you help me try the latest flyte version?
It seems that the example works
p
Thanks for replying, yes I will try to upgrade Flyte and see if that solves the issues