cool-lifeguard-49380
04/17/2023, 8:00 AMnnodes=1
in a single pod, and with nnodes>1
with the pytorch operator.
I think we could try with alpaca now 🦙
The problems with rendezvous flakiness I mentioned in the call on Thursday were actually related to network config on my notebook (no ipv6 enabled).
I have one question about the[W socket.cpp:601] [c10d] The IPv6 network addresses of (1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa, 49651) cannot be retrieved (gai error: 8 - nodename nor servname provided, or not known).
execute
method I copied from PythonFunctionTask
: We don’t need the else case here for dynamic even though the original docstring hints one should implement it as well, right?