https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Armaan Goel

07/06/2023, 6:23 PM
Hi Folks, was just wondering if Flyte supports gang scheduling in order to run multi-node training as part of a workflow?
k

Ketan (kumare3)

07/06/2023, 6:29 PM
You can use any scheduler in concert with Flyte for example - yunikorn, volcano or kqueue - some folks use some of these
a

Armaan Goel

07/06/2023, 6:32 PM
Awesome! Do you happen to have any documentation or examples available for this?
k

Ketan (kumare3)

07/06/2023, 7:20 PM
No docs at the moment/ but use pod templates
I think there was a recent thread on this and @Fabio Grätz and team uses one
f

Fabio Grätz

07/07/2023, 8:00 AM
We use “scheduling plugins” and are happy with it. I noted the steps to get this working here. Let me know if you need help with this.
k

Ketan (kumare3)

07/08/2023, 6:24 AM
@Fabio Grätz want to make a doc PR for this 😍
f

Fabio Grätz

07/08/2023, 10:10 AM
The version constraint is not there anymore, training operator updated their reqs.
Btw could I please have the permissions to create prs directly in the flyte repo? Used a fork now but in the other repos i can
thx
k

Ketan (kumare3)

07/08/2023, 7:04 PM
Absolutely you should have permission
@Eduardo Apolinario (eapolinario) / @Yee can either of you add him?
e

Eduardo Apolinario (eapolinario)

07/08/2023, 9:25 PM
Done. You should have access now, @Fabio Grätz.
f

Fabio Grätz

07/10/2023, 7:57 AM
thx!
10 Views