Question about the Ray plugin. In the <docs>, it m...
# ray-integration
a
Question about the Ray plugin. In the docs, it mentions that the Ray plugin use Ray's Job Submission to run the ray task. However, looking at the code, I don't see the
JobSubmissionClient
being used anywhere. Am I missing something?
k
Kuberay will do that
k
Flyte creates RayJob custom resource
k
@Kevin Su @Ketan (kumare3) Does KubeRay’s RayJob rely on Ray’s job submission API? IIUC, it seems not because Flyte’s Ray plugin still uses Ray client ray.init to interact with Ray cluster created via the plugin
The job submission API will send the job directly to Ray cluster’s head for execution.
k
who brings up the cluster, in kuberays case it brings up the cluster first and then submits the job
k
Yeah, but I meant in your ray flyte plugin backend, it only brings up a ray cluster, the actual job task submission is done via the Ray client (
ray.init
) in RayFunctionTask
k
Does that not need access to the python code
What do you expect?
k
It’s working for some use cases. The downsides are 1) it forces flyte/ray env consistency 2) Ray client is recommended for the interactive dev, and it will eventually go away.
k
if you use ray task in flytekit, we will create a rayJob CRD, kuberay will create a cluster first, and then use ray job submission api to submit the ray job to the head node. therefore, actually, we run RayFunctionTask in the head node. do you want us to launch a separate pod that uses ray client to submit the job to the ray cluster?
k
To be clear, we are not asking for anything. We just would like to understand how this job gets executed on the flyte backend, so we can design things accordingly.
k
I see, on flyte side, we basically job create a RayJobs CRD.
k
therefore, actually, we run RayFunctionTask in the head node.
Does that mean whatever defined under user’s @task decorator will be executed on the head node? In our case, Ray cluster and Flyte cluster are running on different GKE instances
this piece of code
ray.init(address=self._task_config.address)
is run in the flyte container right? IIUC
Under Spotify’s setup, the
address
parameter will be an IP from a pod in a different GKE instance. If that’s the case, it basically uses the Ray client to open a remote connection to the Ray cluster
on flyte side, we basically job create a RayJobs CRD.
In kuberay, a RayJob comes with a Ray python script, and usually is passed in via the configmap (see example here), I don’t think Flyte backend does that right?
k
no, we don’t add the code to the config map. To run the task code, you have to build the image for ray cluster, and this image contain the code. another way is to use fast-register in flytekit. the entrypoint of ray job will become something like
pyflyte-fast-execute --module wf --task ray_task
, and flytekit will download the code before running the ray task.
k
To run the task code, you have to build the image for ray cluster, and this image contain the code.
in this way, the code will not be the one defined under @task decorator? we should package the ray python script into the image? sorry maybe I missed something here
k
I think you folks are talking over each other
Haha, some mis understood / mismatched vocabulary
k
you still write the ray code in the task decorator. it just likes running the regular python task. you have to build the image for you task / workflow. In the flytekit ray task, the image will used in the rayJob CRD.
k
Gotcha thanks!
k
So to quickly recap, Flyte will not run a pod, we simply create a kuberay crd, it creates the driver and worker with flytekit entrypoint. So it is not client mode, cluster mode. Code is either built into the container or sent using fast registration
So pyflyte run should do the right thing
k
👍 sounds great!
thanks!
162 Views