Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Hi, I had a quick question about how Flyte integrates with Ray. 

Under the <https://flyte.org/blog/ray-and-flyte|“Future Work” section here>, it says that Flyte manages Ray clusters on a per task basis. Does this mean that Flyte is spinning up an ephemeral Ray cluster per-task instead of reusing an existing Ray cluster? Also, this post is from 2022; is this still true?

The context is that I’m trying to evaluate Flyte and Ray right now, versus just using Ray where tasks with different dependencies in Ray are just provided a @ray.remote(runtime_env=foo), where foo provides a particular container image.

Thanks!

&gt; Does this mean that Flyte is spinning up an ephemeral Ray cluster per-task instead of reusing an existing Ray cluster?
Yes for the current plugin.

&gt; just using Ray where tasks with different dependencies in Ray are just provided a @ray.remote(runtime_env=foo), where foo provides a particular container image.
This feature is not well supported in Ray, See <https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1693933328280079?thread_ts=1693857743.544139&amp;cid=C02GFQ82JPM>

Using the flyte-ray plugin is essential mapping a flyte task to a ray job. See <https://docs.ray.io/en/master/cluster/kubernetes/getting-started/rayjob-quick-start.html>

So, if ray job suits your use case, then flyte-ray plugin will also fits you need.

Got it, okay. I also saw docs here which say that Ray tasks can spin up new clusters or reuse existing ones: <https://docs.flyte.org/projects/cookbook/en/stable/auto_examples/ray_plugin/ray_example.html|https://docs.flyte.org/projects/cookbook/en/stable/auto_examples/ray_plugin/ray_example.html>

I am now confused about what reusing an existing cluster would mean here 

oh, I see. There are two ways to use the plugin. The reuse one means you need to create a ray cluster by yourself. And then provide the head address to flyte, flyte will auto connect to that ray cluster and run the task.

But I seldomly see people use the “reuse one”

Interesting, is one recommended over the other? I’m sort of curious as to why one approach is seldomly used

I'm not sure about the reasons. <@USU6W5ATA> is the expert on this.

The two methods are used by many people. It depends on your use cases. If you already have a ray cluster running internally, you may want to submit the ray job to your cluster. if not, you can create a ray cluster on k8s per task.