fierce-oil-47448
06/12/2024, 8:33 AMfierce-oil-47448
06/12/2024, 9:12 AMfreezing-airport-6809
thankful-minister-83577
kubectl get -o yaml pytorchjobs
is there enough information there to understand what’s happening?thankful-minister-83577
fierce-oil-47448
06/12/2024, 5:24 PMfierce-oil-47448
06/12/2024, 5:30 PMthankful-minister-83577
fierce-oil-47448
06/12/2024, 5:49 PMfierce-oil-47448
06/12/2024, 5:49 PMfierce-oil-47448
06/12/2024, 5:49 PMfierce-oil-47448
06/12/2024, 5:50 PMfierce-oil-47448
06/12/2024, 5:50 PMfierce-oil-47448
06/12/2024, 5:51 PMfierce-oil-47448
06/12/2024, 5:51 PMfierce-oil-47448
06/12/2024, 5:57 PMfierce-oil-47448
06/12/2024, 5:57 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
fierce-oil-47448
06/12/2024, 6:16 PMhigh-accountant-32689
06/12/2024, 6:19 PMfierce-oil-47448
06/12/2024, 6:19 PMfierce-oil-47448
06/12/2024, 6:19 PMfierce-oil-47448
06/12/2024, 6:19 PMfierce-oil-47448
06/12/2024, 6:20 PMhigh-accountant-32689
06/12/2024, 6:23 PMRestartPolicy
field should be preserved. Let me take another look at the code.fierce-oil-47448
06/12/2024, 6:29 PMhigh-accountant-32689
06/12/2024, 6:35 PMthankful-minister-83577
fierce-oil-47448
06/12/2024, 6:42 PMfierce-oil-47448
06/12/2024, 6:43 PMfierce-oil-47448
06/12/2024, 6:44 PMfierce-oil-47448
06/12/2024, 6:48 PMinitContainers:
- args:
- |
<REDACTED>
command:
- /bin/bash
- -c
env:
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib64/
image: <REDACTED>
imagePullPolicy: Always
name: <REDACTED>
resources: {}
securityContext:
privileged: true
volumeMounts:
- mountPath: /usr/local/nvidia
name: nvidia-install-dir-host
^_ restartPolicy is missing.fierce-oil-47448
06/12/2024, 6:56 PMhigh-accountant-32689
06/13/2024, 3:49 AMPyTorchJob
job to the kubeflow plugin that contains the correct value of RestartPolicy
here.high-accountant-32689
06/13/2024, 4:00 AMPyTorchJob
we reach this line and if we expand the list of warnings (which is a field of result
), we see this one:
unknown field "spec.pytorchReplicaSpecs.Master.template.spec.initContainers[0].restartPolicy"
fierce-oil-47448
06/13/2024, 4:11 AMhigh-accountant-32689
06/13/2024, 3:32 PM1.8.0-rc.0
version of the operator and I can see the field set there.fierce-oil-47448
06/14/2024, 7:40 AMfierce-oil-47448
06/16/2024, 3:55 AM1.8.0-rc.0
version and it has been working well so far.