Question about the Ray plugin In the <https docs flyte org p Flyte #ray-integration

Question about the Ray plugin. In the <docs>, it m...

dry-egg-91175

06/09/2023, 5:59 PM

Question about the Ray plugin. In the docs, it mentions that the Ray plugin use Ray's Job Submission to run the ray task. However, looking at the code, I don't see the

JobSubmissionClient

being used anywhere. Am I missing something?

freezing-airport-6809

06/09/2023, 7:28 PM

Kuberay will do that

glamorous-carpet-83516

06/09/2023, 7:56 PM

Flyte creates RayJob custom resource

rapid-autumn-97122

06/12/2023, 9:48 PM

@glamorous-carpet-83516 @freezing-airport-6809 Does KubeRay’s RayJob rely on Ray’s job submission API? IIUC, it seems not because Flyte’s Ray plugin still uses Ray client ray.init to interact with Ray cluster created via the plugin

rapid-autumn-97122

06/12/2023, 9:49 PM

The job submission API will send the job directly to Ray cluster’s head for execution.

freezing-airport-6809

06/12/2023, 10:18 PM

who brings up the cluster, in kuberays case it brings up the cluster first and then submits the job

rapid-autumn-97122

06/13/2023, 12:38 AM

Yeah, but I meant in your ray flyte plugin backend, it only brings up a ray cluster, the actual job task submission is done via the Ray client (

ray.init

) in RayFunctionTask

freezing-airport-6809

06/13/2023, 12:44 AM

Does that not need access to the python code

freezing-airport-6809

06/13/2023, 12:44 AM

What do you expect?

rapid-autumn-97122

06/13/2023, 12:48 AM

It’s working for some use cases. The downsides are 1) it forces flyte/ray env consistency 2) Ray client is recommended for the interactive dev, and it will eventually go away.

rapid-autumn-97122

06/13/2023, 12:49 AM

The actual Ray job spec looks like this: https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray_v1alpha1_rayjob.yaml#L94

glamorous-carpet-83516

06/13/2023, 12:54 AM

if you use ray task in flytekit, we will create a rayJob CRD, kuberay will create a cluster first, and then use ray job submission api to submit the ray job to the head node. therefore, actually, we run RayFunctionTask in the head node. do you want us to launch a separate pod that uses ray client to submit the job to the ray cluster?

rapid-autumn-97122

06/13/2023, 12:58 AM

To be clear, we are not asking for anything. We just would like to understand how this job gets executed on the flyte backend, so we can design things accordingly.

glamorous-carpet-83516

06/13/2023, 1:01 AM

I see, on flyte side, we basically job create a RayJobs CRD.

rapid-autumn-97122

06/13/2023, 1:01 AM

therefore, actually, we run RayFunctionTask in the head node.

Does that mean whatever defined under user’s @task decorator will be executed on the head node? In our case, Ray cluster and Flyte cluster are running on different GKE instances

rapid-autumn-97122

06/13/2023, 1:02 AM

this piece of code

ray.init(address=self._task_config.address)

is run in the flyte container right? IIUC

rapid-autumn-97122

06/13/2023, 1:04 AM

Under Spotify’s setup, the

address

parameter will be an IP from a pod in a different GKE instance. If that’s the case, it basically uses the Ray client to open a remote connection to the Ray cluster

rapid-autumn-97122

06/13/2023, 1:05 AM

on flyte side, we basically job create a RayJobs CRD.

In kuberay, a RayJob comes with a Ray python script, and usually is passed in via the configmap (see example here), I don’t think Flyte backend does that right?

glamorous-carpet-83516

06/13/2023, 1:18 AM

no, we don’t add the code to the config map. To run the task code, you have to build the image for ray cluster, and this image contain the code. another way is to use fast-register in flytekit. the entrypoint of ray job will become something like

pyflyte-fast-execute --module wf --task ray_task

, and flytekit will download the code before running the ray task.

rapid-autumn-97122

06/13/2023, 1:21 AM

To run the task code, you have to build the image for ray cluster, and this image contain the code.

in this way, the code will not be the one defined under @task decorator? we should package the ray python script into the image? sorry maybe I missed something here

freezing-airport-6809

06/13/2023, 1:26 AM

I think you folks are talking over each other

freezing-airport-6809

06/13/2023, 1:26 AM

Haha, some mis understood / mismatched vocabulary

😄 1

glamorous-carpet-83516

06/13/2023, 1:32 AM

you still write the ray code in the task decorator. it just likes running the regular python task. you have to build the image for you task / workflow. In the flytekit ray task, the image will used in the rayJob CRD.

rapid-autumn-97122

06/13/2023, 1:36 AM

Gotcha thanks!

freezing-airport-6809

06/13/2023, 1:40 AM

So to quickly recap, Flyte will not run a pod, we simply create a kuberay crd, it creates the driver and worker with flytekit entrypoint. So it is not client mode, cluster mode. Code is either built into the container or sent using fast registration

freezing-airport-6809

06/13/2023, 1:40 AM

So pyflyte run should do the right thing

rapid-autumn-97122

06/13/2023, 1:55 AM

👍 sounds great!

rapid-autumn-97122

06/13/2023, 1:55 AM

thanks!

190 Views

Open in Slack

Previous Next