:wave: I have an opinion question for those who ar...
# ask-the-community
t
👋 I have an opinion question for those who are making use of shared tasks with
@reference_task
(or otherwise). We've ended up with many standard tasks, particularly around ETL-style work (e.g. batch capturing data into
jsonl
or
parquet
). These are currently placed into a dedicated flyte project for re-usable components. So far, so good. However, many of our tasks use
NamedTuple
or
@dataclass
types for inputs or outputs and this leaves us with a question of how to get those data types defined in the projects that reference these common tasks. For folks who have already handled this situation. Do you... A - re-define the data types in each project and update them manually if they ever change. B - publish a pip package of just the datatypes and import them in every project that uses them. C - Just avoid custom data types and opt for primitives for shared components D - Something else?
c
I've not handled this specific situation but I having handled similar situations I think the best practice looks a lot more like 2. more than anything else, unless there's some Flyte specific machinery that could let you pass these type definitions around.
t
Cool - yea, that's where we were leaning, but wanted to avoid the complication/overhead that introduces if there was some better solution.
c
I'd imagine it would be quite difficult for Flyte to expose some machinery for this (one good case I guess would be for Flyte to let you also register data types in projects too, the same way you do with tasks, then reference_tasks and reference_types could both be enabled). Given that I don't think this feature exists, I think that publishing your own types across projects is probably the easiest approach until a feature like that came out.
j
we have a similar setup for ETL operations and common libs / data classes and we publish a set of internal python libraries using AWS CodeArtifacts and have been happy with it
t
ah, perfect - thanks for that.
Actually, @Justin Boutwell - to be clear, do you re-use tasks / workflows from some shared flyte project (using something like
@reference_task
)? Or, do you do all the code-sharing via those internal libraries without publishing tasks/workflows to a flyte project?
j
we do both