https://flyte.org logo
#ask-the-community
Title
# ask-the-community
s

Sampath Vaddadi

10/21/2022, 10:12 PM
Hi all, I'm using PyTorchJob job in flyte to train a pytorch model following this tutorial https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html I do not need the distributed training and so I change num_workers=0 but the job gets queued and never starts. If I give num_workers >= 1 the pytorchjob runs. Did anyone face this issue? Any help is greatly appreciated
k

Ketan (kumare3)

10/21/2022, 10:38 PM
have you deployed the pytorch operator?
s

Sampath Vaddadi

10/24/2022, 3:11 PM
Hi Ketan, I did not deploy it in the k8 cluster explicitly. Doesn't it come with installation of flyte?
s

Samhita Alla

10/25/2022, 6:16 AM
@Sampath Vaddadi, you need to deploy it. How does your Flyte deployment look? If you deployed using helm charts, you should be able to check a section such as https://github.com/flyteorg/flyte/blob/ac24c75261c8d4ed780d0733f357bc0a501ba0eb/charts/flyte-core/values-eks.yaml#L255-L274 in your YAMLs.
5 Views