Join Slack
Powered by
Hi all, I'm using PyTorchJob job in flyte to train...
# ask-the-community
s
Sampath Vaddadi
10/21/2022, 10:12 PM
Hi all, I'm using PyTorchJob job in flyte to train a pytorch model following this tutorial
https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html
I do not need the distributed training and so I change num_workers=0 but the job gets queued and never starts. If I give num_workers >= 1 the pytorchjob runs. Did anyone face this issue? Any help is greatly appreciated
k
Ketan (kumare3)
10/21/2022, 10:38 PM
have you deployed the pytorch operator?
Ketan (kumare3)
10/21/2022, 10:38 PM
https://docs.flyte.org/en/latest/deployment/plugin_setup/k8s/index.html#deployment-plugin-setup-k8s
s
Sampath Vaddadi
10/24/2022, 3:11 PM
Hi Ketan, I did not deploy it in the k8 cluster explicitly. Doesn't it come with installation of flyte?
s
Samhita Alla
10/25/2022, 6:16 AM
@Sampath Vaddadi
, you need to deploy it. How does your Flyte deployment look? If you deployed using helm charts, you should be able to check a section such as
https://github.com/flyteorg/flyte/blob/ac24c75261c8d4ed780d0733f357bc0a501ba0eb/charts/flyte-core/values-eks.yaml#L255-L274
in your YAMLs.
175
Views
Open in Slack
Previous
Next