purple-father-70173
02/10/2025, 10:50 PMpurple-father-70173
02/10/2025, 10:54 PMNAME JOB STATUS DEPLOYMENT STATUS RAY CLUSTER NAME START TIME END TIME AGE
awl7lx78t947ldcrc565-testraytask-0 Initializing awl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps 2025-02-10T22:46:20Z 4m54s
Further investigation:
NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE
awl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps 0 0 0 failed 5m12s
Describing the RayCluster shows no obvious events
Status:
Desired CPU: 0
Desired GPU: 0
Desired Memory: 0
Desired TPU: 0
Head:
State: failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 6m31s raycluster-controller Created service account wl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps
Normal Created 6m31s raycluster-controller Created role wl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps
Normal Created 6m31s raycluster-controller Created role binding wl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps
Normal Created 6m31s raycluster-controller Created ingress crc565-testraytask-0-raycluster-dkwps-head-ingress
Checking the logs of the kuberay operator:
{"level":"error","ts":"2025-02-10T22:51:48.356Z","logger":"controllers.RayCluster","msg":"Pod Service create error!","RayCluster":{"name":"awl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps","namespace":"fl97"},"reconcileID":"dacf02b6-21ae-4118-a80d-d73e15c70c7c","Pod.Service.Error":"Service \"r7ldcrc565-testraytask-0-raycluster-dkwps-head-svc\" is invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\"\", Protocol:\"TCP\", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}]","error":"Service \"r7ldcrc565-testraytask-0-raycluster-dkwps-head-svc\" is invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\"\", Protocol:\"TCP\", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}]","stacktrace":"<http://github.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).createService|github.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).createService>\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/raycluster_controller.go:1002\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).reconcileHeadService\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/raycluster_controller.go:549\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).rayClusterReconcile\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/raycluster_controller.go:330\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayClusterReconciler).Reconcile\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/raycluster_controller.go:169\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2025-02-10T22:51:48.356Z","logger":"controllers.RayCluster","msg":"Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: <https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler>","RayCluster":{"name":"awl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps","namespace":"fl97"},"reconcileID":"dacf02b6-21ae-4118-a80d-d73e15c70c7c"}
{"level":"error","ts":"2025-02-10T22:51:48.356Z","logger":"controllers.RayCluster","msg":"Reconciler error","RayCluster":{"name":"awl7lx78t947ldcrc565-testraytask-0-raycluster-dkwps","namespace":"fl97"},"reconcileID":"dacf02b6-21ae-4118-a80d-d73e15c70c7c","error":"Service \"r7ldcrc565-testraytask-0-raycluster-dkwps-head-svc\" is invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\"\", Protocol:\"TCP\", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}]","stacktrace":"<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler>\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
Extracting the relevant text:
"Pod.Service.Error":"Service \"r7ldcrc565-testraytask-0-raycluster-dkwps-head-svc\" is invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\"\", Protocol:\"TCP\", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}]","error":"Service \"r7ldcrc565-testraytask-0-raycluster-dkwps-head-svc\" is invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\"\", Protocol:\"TCP\", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:\"\"}, NodePort:0}]"
It looks like there's an issue with service creation? Is there another version of kuberay that I should be deploying that doesn't have this issue? I'm in an on-prem k8s cluster (RKE2)purple-father-70173
02/10/2025, 10:55 PMglamorous-carpet-83516
02/11/2025, 12:45 AMs invalid: [spec.ports[3].nodePort: Duplicate value: 31517, spec.ports[3]: Duplicate value: core.ServicePort{Name:\β\β, Protocol:\βTCP\β, AppProtocol:(*string)(nil), Port:8080, TargetPortintstr.IntOrString{Type0, IntVal:0, StrVal:\β\β}, NodePort0}]β,βstacktraceββsigs.k8s.io/controller-runtime/pkg/internal/controller.are you using nodePort for the RayCluster?
purple-father-70173
02/11/2025, 12:55 AMpurple-father-70173
02/12/2025, 12:11 AMserviceType
from NodePort
to ClusterIP
. When I do this I still get the same FailedToCreateService
error: Failed creating service fl97/brzkjkfl9g-testraytask-0-raycluster-snpj2-head-svc, Service "brzkjkfl9g-testraytask-0-raycluster-snpj2-head-svc" is invalid: spec.ports[3]: Duplicate value: core.ServicePort{Name:"", Protocol:"TCP", AppProtocol:(*string)(nil), Port:8080, TargetPort:intstr.IntOrString{Type:0, IntVal:0, StrVal:""}, NodePort:0}
Instead of multiple of these services failing, it's just one. Does anyone know how to modify the kubray-operator to stop creating nodeports?purple-father-70173
02/12/2025, 5:08 PM- containerPort: 8080
ββ name: http
ββ protocol: TCP
Modifying my webhook to remove that port fixed the issue. Not sure why Flyte does this.
Additionally, the example given breaks immediately because the default flyte pod doesn't have ray! So now I have to fix the example and figure out the containers for everythingpurple-father-70173
02/12/2025, 8:22 PM