hey a while back there was an RFC on ray integrati...
# ray-integration
f
hey a while back there was an RFC on ray integration that included something about support for persisting cluster resources across tasks, is that something still in progress? can someone point me to the docs?
t
@glamorous-carpet-83516
so we spoke with @rapid-autumn-97122 about the persistent cluster feature.
to clarify, you mean that if one workflow runs two ray tasks, they both end up using the same cluster right?
f
yea to save the boot time right?
and possibly skip serialization?
t
to save boot time yes. serialization still happens iiuc
g
yup
f
do you have any more info on the persistent clusters? we could potentially use it to speed up our end to end workflows by quite a bit 🙂
g
we don’t support it right now. we can support that, need to update backend plugin. I’ll work on it next week, and get back to you once it’s done.
f
haha that's great! but i'm mostly looking to understand the mechanics and i remember there being an RFC discussing them
we don't have an urgent timeline and are looking to plan some work
f
@glamorous-carpet-83516 let’s wait on this. @famous-businessperson-24711 what we understood from @rapid-autumn-97122 was that there is potential corruption that happens when a cluster is reused in ray
f
does that mean it's off the roadmap?
or just needs to be thought through more?
f
its not off the roadmap
it can be done on flyte side, we dont know if ray is ready yet
but de-prioritized
f
got it, thanks for the context 🙏
actually wait, "it can be done on flyte side" does this mean the infra for reusing resources exists?
g
There is a
ClusterSelector
In ray job CRD, so basically we should be able to use it to run the ray job on the existing cluster. The propeller need to save the rayCluster id generated by first ray task, and the second ray task should reuse the same ray cluster by passing the cluster selector. lastly, propeller shut down the ray cluster at the end node.
👍 1
r
@freezing-airport-6809 We will need this feature as well. With more complex Flyte workflows, users should be able to share Ray cluster among different Flyte tasks.
f
I know you will, remember we were adding this but you had issues
153 Views