https://flyte.org logo
#ray-integration
Title
# ray-integration
d

Dylan Wilder

11/18/2022, 9:36 PM
hey a while back there was an RFC on ray integration that included something about support for persisting cluster resources across tasks, is that something still in progress? can someone point me to the docs?
y

Yee

11/18/2022, 9:37 PM
@Kevin Su
so we spoke with @Keshi Dai about the persistent cluster feature.
to clarify, you mean that if one workflow runs two ray tasks, they both end up using the same cluster right?
d

Dylan Wilder

11/18/2022, 9:39 PM
yea to save the boot time right?
and possibly skip serialization?
y

Yee

11/18/2022, 9:40 PM
to save boot time yes. serialization still happens iiuc
k

Kevin Su

11/18/2022, 9:56 PM
yup
d

Dylan Wilder

11/18/2022, 10:09 PM
do you have any more info on the persistent clusters? we could potentially use it to speed up our end to end workflows by quite a bit 🙂
k

Kevin Su

11/18/2022, 10:14 PM
we don’t support it right now. we can support that, need to update backend plugin. I’ll work on it next week, and get back to you once it’s done.
d

Dylan Wilder

11/18/2022, 10:15 PM
haha that's great! but i'm mostly looking to understand the mechanics and i remember there being an RFC discussing them
we don't have an urgent timeline and are looking to plan some work
k

Ketan (kumare3)

11/18/2022, 11:25 PM
@Kevin Su let’s wait on this. @Dylan Wilder what we understood from @Keshi Dai was that there is potential corruption that happens when a cluster is reused in ray
d

Dylan Wilder

11/18/2022, 11:30 PM
does that mean it's off the roadmap?
or just needs to be thought through more?
k

Ketan (kumare3)

11/18/2022, 11:35 PM
its not off the roadmap
it can be done on flyte side, we dont know if ray is ready yet
but de-prioritized
d

Dylan Wilder

11/18/2022, 11:36 PM
got it, thanks for the context 🙏
actually wait, "it can be done on flyte side" does this mean the infra for reusing resources exists?
k

Kevin Su

11/19/2022, 12:01 AM
There is a
ClusterSelector
In ray job CRD, so basically we should be able to use it to run the ray job on the existing cluster. The propeller need to save the rayCluster id generated by first ray task, and the second ray task should reuse the same ray cluster by passing the cluster selector. lastly, propeller shut down the ray cluster at the end node.
k

Keshi Dai

11/19/2022, 2:20 AM
@Ketan (kumare3) We will need this feature as well. With more complex Flyte workflows, users should be able to share Ray cluster among different Flyte tasks.
k

Ketan (kumare3)

11/19/2022, 3:58 AM
I know you will, remember we were adding this but you had issues
66 Views