Hi while initiating ray cluster the task is running in only Flyte #ray-integration

Hi, while initiating ray cluster, the task is runn...

future-notebook-79388

12/08/2022, 3:11 PM

Hi, while initiating ray cluster, the task is running in only one instance and pod. Generally if a ray cluster is initiated it is expected to run in different instance in distributed manner right? can we do horizontal scaling here to increase the pool of resources here?

tall-lock-23197

12/09/2022, 4:33 AM

cc: @glamorous-carpet-83516

glamorous-carpet-83516

12/09/2022, 5:35 AM

hmm, if ray task is started, propeller should create head node and workers nodes. did you enable the ray plugin in propeller?

Copy code

tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - k8s-array
      - ray
    default-for-task-types:
      container: container
      sidecar: sidecar
      container_array: k8s-array
      ray: ray

future-notebook-79388

12/09/2022, 12:50 PM

Yeah ray plugin is enabled

glamorous-carpet-83516

12/09/2022, 6:13 PM

is there any error in the kuberay operator?

future-notebook-79388

12/12/2022, 5:05 AM

not sure. how to check if it works fine?

glamorous-carpet-83516

12/12/2022, 8:23 AM

kubectl logs <kuberay-operator> -n ray-system

future-notebook-79388

12/12/2022, 9:24 AM

message has been deleted

glamorous-carpet-83516

12/12/2022, 7:31 PM

have you installed ingress controller? if not, it will cause an error in kuberay, kuberay use ingress controller to create a new ingress route for RayJob

future-notebook-79388

12/13/2022, 9:32 AM

yes ingress controller is installed in the setup

glamorous-carpet-83516

12/13/2022, 8:43 PM

@future-notebook-79388 do you have couple mins to hop on a call?

future-notebook-79388

12/14/2022, 4:22 AM

sure ... pls let me know ur feasible timings

glamorous-carpet-83516

12/14/2022, 7:31 PM

maybe 9~12 AM in your time

future-notebook-79388

12/18/2022, 6:44 AM

Sorry for the inconvenience @glamorous-carpet-83516. We were having live demo so couldn't work on the setup. Will tomorrow same time work for u ?

glamorous-carpet-83516

12/18/2022, 6:47 AM

No worries, yes, ping me tomorrow when you are available

future-notebook-79388

12/19/2022, 1:49 PM

Hi actually once the helm is upgraded I am able to see the worker pods getting created. But the issue now is that the task is getting queued for a long time it is not getting initiated. It gets

The node was low on resource: ephemeral-storage

and it is trying to initiate a new pod but we have enough ephemeral storage in the instance.

future-notebook-79388

12/19/2022, 1:53 PM

The docker image that we are trying to pull is nearly 10gb. will that be an issue? shall we connect by tomorrow mrng 9 AM on my time? can u confirm on where to connect through slack or google meet?

glamorous-carpet-83516

12/19/2022, 6:31 PM

I’ll call you at 9am your time through google meet

158 Views

Open in Slack

Previous Next