Hello We noticed that the ray plugin when used to create a R Flyte #ray-integration

Hello! We noticed that the ray plugin, when used t...

bright-fireman-49979

06/22/2023, 5:42 PM

Hello! We noticed that the ray plugin, when used to create a RayCluster and RayJob on another, separate, GKE cluster, it will default to try to create those resources in a namespace (in the separate GKE cluster) with the same name as the one running the flyte workflow. Have some questions here: • Is there any way to override this behaviour? ◦ If there isn’t any, is this something that could be added?

➕ 2

glamorous-carpet-83516

06/22/2023, 7:52 PM

you can’t override for now, but you can add a new config (namespace?) in ray plugin, and override rayjob’s namespace

bright-fireman-49979

06/22/2023, 8:09 PM

thanks for the quick reply?

and override rayjob’s namespace

how would I do that? I was wondering if we could potentially add an extra parameter to https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/plugins/ray.proto#L17 that defines the namespace. Then we could take that new field in the plugin and override the namespace in the rayjob

freezing-airport-6809

06/22/2023, 9:16 PM

name is critical to flyte

freezing-airport-6809

06/22/2023, 9:16 PM

this is how it identifies the job, overriding this should not be permitted

freezing-airport-6809

06/22/2023, 9:17 PM

otherwise, you may get into a situation of runaway jobs

bright-fireman-49979

06/22/2023, 9:18 PM

to be clear, we don’t want to change the namespace where the flyte workflow is running, just where the rayjob and raycluster will be created (we are already using another gke cluster for ray than flyte) you are saying that more than just the ray flyte plugin depends on the namespace to be named the same?

freezing-airport-6809

06/22/2023, 10:40 PM

the question is - how do we add namespace. pod-templates seem to be the right thing to do this. But, this can lead to a lot of problems, about how we track the status etc

freezing-airport-6809

06/22/2023, 10:40 PM

we create event listeners to specific namespaces and changing that could cause problems

rapid-autumn-97122

07/06/2023, 4:27 PM

@freezing-airport-6809 @glamorous-carpet-83516 I would like to follow up on this issue. When we were trying out the Ray plugin, we had the following issues related to our Ray/Flyte internal setup: 1. Namespace to launch a Ray job (Flyte uses the same namespace where Flyte job runs for the Ray job) 2. Fixed service account (Flyte doesn’t specify a service account in Ray job, which picks up

default

k8s SA) 3. Doesn’t allow different resource configurations for head and workers (Currently head and workers use the same resource configurations) 4. Node pool selector for head/worker nodes (There’s no way we can specify node pool in head/worker spec)

rapid-autumn-97122

07/06/2023, 4:33 PM

I wonder if it would be possible to make them configurable via Flyte Ray plugin, or you guys have other suggestions to enable our use cases. Thank you!

freezing-airport-6809

07/06/2023, 6:35 PM

No 3/4 seem very fixable problems

freezing-airport-6809

07/06/2023, 6:36 PM

For 2 for every execution you can specify a separate service account

freezing-airport-6809

07/06/2023, 6:37 PM

Namespaces should be configured at Flyte level and I am sorry this is non negotiable as it will lead to corruptions. But you can change Flyte to run in only one namespace then it’s upto you to manage fair scheduling

bright-fireman-49979

07/06/2023, 6:48 PM

@freezing-airport-6809, regarding:

For 2 for every execution you can specify a separate service account

can you share an example on how to specify the SA for the rayjob (which runs in a different GKE cluster than flyte)

freezing-airport-6809

07/06/2023, 6:54 PM

Also please feel free to propose contributions

rapid-autumn-97122

07/06/2023, 8:11 PM

Thanks Ketan! (I heard you are on vacation - sorry that my question interrupted your vacation, this is not urgent at all)

Namespaces should be configured at Flyte level and I am sorry this is non negotiable

Do you mind expanding it a bit? I think there might be a misunderstanding. In our internal setup, Flyte and Ray run on different GKE clusters (see the graph below). Here we are asking to make the namespace parameter for the RayJob configurable, so we can control the namespace of a RayJob k8s resource object that Flyte plugin submits to our Ray infrastructure. We do not want to change the namespace where Flyte job runs. Let us know if there’re any constraints that we are not aware of.

rapid-autumn-97122

07/06/2023, 8:13 PM

(thanks @dry-egg-91175 for this nice diagram!)

rapid-autumn-97122

07/06/2023, 8:15 PM

Also please feel free to propose contributions

Yes, we are interested in doing so

155 Views

Open in Slack

Previous Next