Hi team, we recently started experimenting with a ...
# flyte-support
h
Hi team, we recently started experimenting with a multi-cluster setup. In particular, we found this change to allow setting executionclusterlabel at runtime was helpful. However, when we started using it, we realized that Relaunching a workflow doesn't work as expected. The workflow gets launched in the default cluster rather than the one specified with the original execution. Upon debugging we realized that the relaunched execution did not 1:1 map the old execution-spec with the new spec. A lot of other properties such as qos etc aren't reapplied to the new execution. Just wanted to confirm if my analysis is correct and this indeed is a bug?
a
@hundreds-baker-75079 thanks for the thorough investigation. @brief-window-55364 have you seen this behavior in your environment?
b
Not really but we don't relaunch executions we always create new
Let me investigate a bit
h
I had investigated further and I realized flyteconsole is consuming an older flyteidl, which doesn't have the executionClusterLabel changes in executionSpec. So most likely when creating a relaunch request, it isn't able to send the request with the all the required values. And this causes it to be broken
b
Yep found those too. You could try manually bumping the flyteidl and build locally? @average-finland-92144 do you know when the next release will be done? We can fix it then with updated flyteidls
I'm OOO right now but tomorrow can do some more testing to see if only flyteidl is needed or something else
gratitude thank you 1
a
there should be a new release in 2 weeks, or less
h
yes, i'm trying to do that. i'm pretty new to frontend dev and my progress in that direction has been slow so far. will let you know once i have some answers
Ok so I was able to validate that just having idl changes available doesn't help. We'll need to look for relevant code changes as well
h
@late-eye-50215 ^^
h
I was able to get things to work locally. Some code changes are necessary
b
Nice! I'm testing things out myself.
@hundreds-baker-75079 do you want to open a PR with what you got or do you want me keep working on my end?
h
I wouldn't want to duplicate the effort
I've made changes locally and tested them
b
Ok you go ahead then
h
It'll take a day more atleast to clean things up and add required tests
b
Seems like you're ahead already 🙂
h
Atm, in the relaunch form, i haven't added an input field. I'm just passing back the value I already have. Would you want to add the field to the form?
b
I don't see why not, might be useful for folks that might want to trigger in another cluster.
h
here is the patch for my current changes. I've currently only test relaunching workflows.
I have a bigger question here. Adding a new field one by one doesn't feel like the best approach. fields like qos, etc are missing. they would face the same problem as ecl today. I want to explore alternative solutions that ensures relaunching a wf means the execution spec is copied as is to the new execution, with any overrides as necessary. That way even if a new field is added, the data is correctly used for relaunch. thoughts?
b
Again we never use the flyteconsole where I work so hard to say but looking at the code aren't the fields prepopulated?
h
Ah I see. So the way relaunch works in console is there is a get call. it gets the data. and then when you click launch, the create workflow api is called. however, the current code selectively retains the data it gets. so the create api is called only with the selected data. everything else gets discarded
anyway, let me try to figure things out. worst case, i'll send out a pr for this
🙌 1
b
Sounds good! Let me know if I can help any further.
Hey @hundreds-baker-75079. Did you manage to get something going?
h
Hi Rafael, sorry I got pulled into something else last week. I'll send something out in a day or 2 🙂
b
No worries! I was trying something out myself
h
@brief-window-55364 did you get a chance to try reference workflows with execution labels? They don't seem to get auto populated/forwarded when I launch a wf with an execution label
https://github.com/flyteorg/flyte/pull/5431 here is a pr to fix this issue
@brief-window-55364 could you help merge these changes upstream. this will be my first time and idk what else is required
b
Hey @hundreds-baker-75079. Let me ping the pros @average-finland-92144 can you take a look at this one?
👀 1
✌️ 1
h
@average-finland-92144 these are the 2 other prs for multicluster: https://github.com/flyteorg/flyte/pull/5431, https://github.com/flyteorg/flyteconsole/pull/873. Please take a look when you can?
🙇🏽 1
👀 1