Hi, I'm trying to get some of my data processing j...
# ray-integration
m
Hi, I'm trying to get some of my data processing jobs working on Ray with Flyte. I have KubeRay 0.4.0 installed. When my flyte task is started, ray cluster gets created. It is accessible, I can access the dashboard. I can also send jobs to it from my local machine with port forwarding. Unfortunately the original job I have is not run, so the cluster is waiting with no tasks. I looked at the logs of kuberay-operator and all I see is the lines being repeated forever:
Copy code
2023-02-23T18:08:48.386Z    INFO    controllers.RayJob    RayJob associated rayCluster found    {"rayjob": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0", "raycluster": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:48.387Z    INFO    controllers.RayJob    waiting for the cluster to be ready    {"rayCluster": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:51.387Z    INFO    controllers.RayJob    reconciling RayJob    {"NamespacedName": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0"}
2023-02-23T18:08:51.388Z    INFO    controllers.RayJob    RayJob associated rayCluster found    {"rayjob": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0", "raycluster": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:51.388Z    INFO    controllers.RayJob    waiting for the cluster to be ready    {"rayCluster": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:54.388Z    INFO    controllers.RayJob    reconciling RayJob    {"NamespacedName": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0"}
2023-02-23T18:08:54.388Z    INFO    controllers.RayJob    RayJob associated rayCluster found    {"rayjob": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0", "raycluster": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:54.389Z    INFO    controllers.RayJob    waiting for the cluster to be ready    {"rayCluster": "a8xqtvnds2sp7bgkn96k-fzvfpg5y-0-raycluster-xlg4f"}
2023-02-23T18:08:57.389Z    INFO    controllers.RayJob    reconciling RayJob    {"NamespacedName": "ntropy-development/a8xqtvnds2sp7bgkn96k-fzvfpg5y-0"}
These logs seem to be generated by the this piece of code: https://github.com/ray-project/kuberay/blob/89f5fba8d6f868f9fedde1fbe22a6eccad88ecc1/ray-operator/controllers/ray/rayjob_controller.go#L174 and are unexpected as the cluster is healthy and I can use it on the side. I would appreciate any help and advice. Do you think the operator version? My flyte deployment is in version: 1.2.1 Ray in cluster is 2.2.0 flytekitplugins-ray: 1.2.7
175 Views