another question, Are there any examples of repos ...
# ask-the-community
d
another question, Are there any examples of repos with many workflows that have overlapping tasks, and/or combined workflows? I'm somewhat confused as to whether a python library with many tasks would be best, or if I should be registering everything and using
fetch_task
... I'm leaning towards a library because then we get type hints which I forsee as solving some input/output headaches... Edit: I take it back... I see the utility of registering repos ( disconnected python env requirements )... lack of python typing does seem sad though....
y
are there conflicts in dependencies? or just too many dependencies?
d
working with a very complex (to put it nicely) codebase... maybe wf-A requires torch=XX while wf-B requires torch=YY (except so much worse than that...)
a combination of both I suppose... It seems like I'm going to want to re-use tasks, but maybe I shouldn't think like that, and I should only strive to re-use workflows
y
you can use reference tasks which are kinda like fetched tasks, but those are annoying because you have to manually specify the python type signature.
d
yes I noticed that... not ideal : /
y
in the past we definitely talked about including some indication of the python type signature in the fetched task but that kinda breaks the type boundary… also no guarantee that things work (type X fetched from a task doesn’t necessarily correspond to type X in your codebase)
not sure if there’s a good solution
but some of this headache is just endemic to large projects in general with conflicting dependencies
k
@Dan Farrell thank you for connecting and the questions. IMO, large software projects become complex over time. There is no one right answer. Here are the reasons we support a few things 1. Multiple registrations in single repo You want to register many workflows, tasks and manage everything in one place. But separate these across teams. This is why we also have a concept of project and domains 2. Multi image support in one workflow You can build different images and use them within a workflow. Imagespec actually makes this ever simpler 3. Reference tasks and workflows In cases you want to share tasks and workflows across teams or as a common repository, without having to add all dependencies to every project. More like RPC services
Also
another question, Are there any examples of repos with many workflows that have overlapping tasks, and/or combined workflows?
Flytesnacks is an example. Massive repo, lots of workflows. Not overlaping though. But we do have these examples at all the large companies.
d
So right now
Copy code
if sklearn_image_spec.is_container():
works if the types are contained to within the task that the container is used in. But what about if I want a type that is specific to that container to be a return type? what would you suggest the type-hinting be?
k
@Dan Farrell then the best method is to separate out the file. I had an example for this, let me ask @Samhita Alla if she turned it into a flytesnacks example?
@Dan Farrell the best way is to keep a structure, where every file is bound to a container and you have one file which houses the workflow
s
I haven't worked on an example yet. @Ketan (kumare3) should the task be defined in the
is_container()
block?
k
no let me share with you again
this example
Copy code
multi/
├── py_task.py
├── spark_task.py
└── wf.py
Cc @Peeter Piegaze
d
@Ketan (kumare3) might you be able to share the example directly with me? When the 'combo' workflow is being parsed
flytekit/core/interface.py
seems to be grabbing all tasks's type hints via
type_hints = get_type_hints(fn, include_extras=True)
but this fails because my
Foo
class isn't available in the
combo-wf
environment.
I created a repo to explain my thinking here... https://github.com/danpf/flyte-eval/tree/master/attempt_03/src/python I think it would be incredibly valuable for companies that are focused heavily on research (like biotech) to come up with some way for workflows/dynamic workflows to be either registered on the fly, or be parsed in separate environments (maybe as an arg to the decorator?) because it would really really speed up a scientist's time if they can quickly poke and prod at all aspects of a pipeline, not just one at a time.
i'm also happy to contribute this, if you think it can be done/aligns with the project's goals
k
@Dan Farrell sorry i am looking at your example, is this the multi container example
s
@Dan Farrell, could you please explain why you used a dynamic workflow? Since you want to call a bunch of workflows, you can just create a parent workflow and call all your sub-workflows from within it.
d
@Ketan (kumare3) yes, or having multiple subworkflows w/ separate environments + avoiding having to used registered workflows + having type hinting for development. @Samhita Alla because workflows(or subworkflows) just return an error, and dynamic workflows return a stacktrace either way, the errors were the same (name 'Foo' is not defined)
s
Can you do a relative import instead of an absolute import?
d
for the
one_off_combo...
script? I'd prefer not to, would like to allow scientists to work outside of the git-repo if they choose to (most do)
what would relative/absolute import change? The error i'm getting is w/r/t the parsing of the task annotations, which I doubt would change
s
Isn't
name 'Foo' is not defined
the error?
I believe
Foo
is the output type of your task, correct? And that isn't being found by Flyte.
d
yes
s
Right. In that case, the error has got to do with the imports.