Hi all, I'm using PyTorchJob job in flyte to train...
# ask-the-community
s
Hi all, I'm using PyTorchJob job in flyte to train a pytorch model following this tutorial https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html I do not need the distributed training and so I change num_workers=0 but the job gets queued and never starts. If I give num_workers >= 1 the pytorchjob runs. Did anyone face this issue? Any help is greatly appreciated
k
have you deployed the pytorch operator?
s
Hi Ketan, I did not deploy it in the k8 cluster explicitly. Doesn't it come with installation of flyte?
s
@Sampath Vaddadi, you need to deploy it. How does your Flyte deployment look? If you deployed using helm charts, you should be able to check a section such as https://github.com/flyteorg/flyte/blob/ac24c75261c8d4ed780d0733f357bc0a501ba0eb/charts/flyte-core/values-eks.yaml#L255-L274 in your YAMLs.
116 Views