Hello! We noticed that the ray plugin, when used t...
# ray-integration
m
Hello! We noticed that the ray plugin, when used to create a RayCluster and RayJob on another, separate, GKE cluster, it will default to try to create those resources in a namespace (in the separate GKE cluster) with the same name as the one running the flyte workflow. Have some questions here: • Is there any way to override this behaviour? ◦ If there isn’t any, is this something that could be added?
k
you can’t override for now, but you can add a new config (namespace?) in ray plugin, and override rayjob’s namespace
m
thanks for the quick reply?
and override rayjob’s namespace
how would I do that? I was wondering if we could potentially add an extra parameter to https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/plugins/ray.proto#L17 that defines the namespace. Then we could take that new field in the plugin and override the namespace in the rayjob
k
name is critical to flyte
this is how it identifies the job, overriding this should not be permitted
otherwise, you may get into a situation of runaway jobs
m
to be clear, we don’t want to change the namespace where the flyte workflow is running, just where the rayjob and raycluster will be created (we are already using another gke cluster for ray than flyte) you are saying that more than just the ray flyte plugin depends on the namespace to be named the same?
k
the question is - how do we add namespace. pod-templates seem to be the right thing to do this. But, this can lead to a lot of problems, about how we track the status etc
we create event listeners to specific namespaces and changing that could cause problems
k
@Ketan (kumare3) @Kevin Su I would like to follow up on this issue. When we were trying out the Ray plugin, we had the following issues related to our Ray/Flyte internal setup: 1. Namespace to launch a Ray job (Flyte uses the same namespace where Flyte job runs for the Ray job) 2. Fixed service account (Flyte doesn’t specify a service account in Ray job, which picks up
default
k8s SA) 3. Doesn’t allow different resource configurations for head and workers (Currently head and workers use the same resource configurations) 4. Node pool selector for head/worker nodes (There’s no way we can specify node pool in head/worker spec)
I wonder if it would be possible to make them configurable via Flyte Ray plugin, or you guys have other suggestions to enable our use cases. Thank you!
k
No 3/4 seem very fixable problems
For 2 for every execution you can specify a separate service account
Namespaces should be configured at Flyte level and I am sorry this is non negotiable as it will lead to corruptions. But you can change Flyte to run in only one namespace then it’s upto you to manage fair scheduling
m
@Ketan (kumare3), regarding:
For 2 for every execution you can specify a separate service account
can you share an example on how to specify the SA for the rayjob (which runs in a different GKE cluster than flyte)
k
Also please feel free to propose contributions
k
Thanks Ketan! (I heard you are on vacation - sorry that my question interrupted your vacation, this is not urgent at all)
Namespaces should be configured at Flyte level and I am sorry this is non negotiable
Do you mind expanding it a bit? I think there might be a misunderstanding. In our internal setup, Flyte and Ray run on different GKE clusters (see the graph below). Here we are asking to make the namespace parameter for the RayJob configurable, so we can control the namespace of a RayJob k8s resource object that Flyte plugin submits to our Ray infrastructure. We do not want to change the namespace where Flyte job runs. Let us know if there’re any constraints that we are not aware of.
(thanks @Abdullah Mobeen for this nice diagram!)
Also please feel free to propose contributions
Yes, we are interested in doing so
146 Views