Join Slack
Powered by
Hi all, I'm using PyTorchJob job in flyte to train...
# flyte-support
w
worried-winter-16424
10/21/2022, 10:12 PM
Hi all, I'm using PyTorchJob job in flyte to train a pytorch model following this tutorial
https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html
I do not need the distributed training and so I change num_workers=0 but the job gets queued and never starts. If I give num_workers >= 1 the pytorchjob runs. Did anyone face this issue? Any help is greatly appreciated
f
freezing-airport-6809
10/21/2022, 10:38 PM
have you deployed the pytorch operator?
freezing-airport-6809
10/21/2022, 10:38 PM
https://docs.flyte.org/en/latest/deployment/plugin_setup/k8s/index.html#deployment-plugin-setup-k8s
w
worried-winter-16424
10/24/2022, 3:11 PM
Hi Ketan, I did not deploy it in the k8 cluster explicitly. Doesn't it come with installation of flyte?
t
tall-lock-23197
10/25/2022, 6:16 AM
@worried-winter-16424
, you need to deploy it. How does your Flyte deployment look? If you deployed using helm charts, you should be able to check a section such as
https://github.com/flyteorg/flyte/blob/ac24c75261c8d4ed780d0733f357bc0a501ba0eb/charts/flyte-core/values-eks.yaml#L255-L274
in your YAMLs.
191
Views
Open in Slack
Previous
Next