Hi community! What is the better way to handle reu...
# flyte-support
h
Hi community! What is the better way to handle reusable tasks? Background: We recently meet the issue of registering multiple workflows (they share several tasks) from one git hash and fail to run all of them except the first workflow get registered. 1. We use flytectl register --continueOnError 2. We use {branch_name} + {git_hash} as version 3. For example we register workflow 1 and get the workflow. When register the workflow 2, it will fail at the task level pb since the task name and version appears before.
Copy code
log: /tmp/_pb_output/132_src.python.flyte.radar_ml.model_workflows.lpm.main.run_experiment_pipeline_1.pb                  | Failed  | Error registering file due to rpc error: code =                                                    |
|                                                                                                                      |         | InvalidArgument desc = task with different structure                                               |
|                                                                                                                      |         | already exists with id resource_type:TASK                                                          |
|                                                                                                                      |         | project:"radar-ml" domain:"adhoc"                                                                  |
|                                                                                                                      |         | name:"src.python.flyte.radar_ml.model_workflows.lpm.main.run_experiment_pipeline"                  |
|                                                                                                                      |         | version:"multi-workflows-e3660b201e6d"
4. Because we use
flytectl register --continueOnError
,the workflow 2 will register successfully but it will reuse the task with the old image from workflow 1. The immediate fix I can think of is: we add timestamp in the version and it becomes {branch_name} + {git_hash} + {timestamp}. This will make sure the task won't conflict. In the long term, does Flyte consider to use
workflow + function reference + version
as task key? It is very common for our users to implement shared tasks, so relying on function reference is a huge pain for us.
cc @freezing-airport-6809 @thankful-minister-83577 appreciate any suggestion from your side.
a
@helpful-van-10149 have you tried with reference tasks? The uniqueness of a workflow in Flyte is defined by the combination of project+domain+name+version so that's why your workaround on the version helps, but maybe reference tasks give you a better abstraction
h
Should I just choose one task as "real", and the other tasks is "reference" it?
a
maybe for the sake of keeping it organized, yes
b
@average-finland-92144 Thanks for the reply. Just to clarify, using reference tasks sounds like registration order would be important then. eg., if we have 2 workflows that reference TaskA — any updates to TaskA would first need to be registered before the two workflows otherwise the workflows would point to a stale version of TaskA. Is that correct?
{{ registration.version }}
Sounds like what we want, but that would just take the latest available version I'm guessing? We definitely wouldn't want users to have to manually specify a version of the task
t
any updates to TaskA would first need to be registered before the two workflows otherwise the workflows would point to a stale version of TaskA. Is that correct?
you should update the version in your reference entity when you update your upstream entity. that should help with keeping your reference entity up to date.
but that would just take the latest available version I'm guessing?
the macro is populated during workflow registration. if you want to update the version, you'd need to re-register your workflow.
gratitude thank you 1