mammoth-mouse-1111
05/13/2025, 8:45 PM@dynamic
def batched_workflow(...):
for n in repeats:
sub_workflow(...)
@workflow
def sub_workflow(...):
flyte_ray_task(...)
@task(task_config=RAY_JOB_CONFIG)
def flyte_ray_task(...)
ray.get([ray_fn.remote(...)])
@ray.remote
def ray_fn(...)
...
When launching this nested batched_workflow
, once the subworkflow gets to flyte_ray_task
, it hangs. No pods for the ray cluster are created as would normally happen when sub_workflow
gets launched in isolation.
Inspecting the logs of the raycluster object that is created, I see the following:
│ Warning FailedToCreateIngress 2m14s (x25 over 124m) raycluster-controller Failed creating ingress raycluster/qkdzqy-0-dn0-0-dn2-0-raycluster-f5xjz-head-ingress, Ingress │
│ .<http://networking.k8s.io|networking.k8s.io> "qkdzqy-0-dn0-0-dn2-0-raycluster-f5xjz-head-ingress" is invalid: metadata.labels: Invalid value: "ajct5z8clztdl9vgb7gh-fxqkdzqy-0-dn0-0-dn2-0-raycluster-f5 │
│ xjz-head": must be no more than 63 characters
So it looks like because of the nesting flyte creates a very long label that blows up due to a k8s limitation. Does anyone know a workaround for this?glamorous-carpet-83516
05/13/2025, 9:31 PM