another question Are there any examples of repos with many w Flyte #flyte-support

another question, Are there any examples of repos ...

sparse-advantage-22780

09/19/2023, 9:06 PM

another question, Are there any examples of repos with many workflows that have overlapping tasks, and/or combined workflows? I'm somewhat confused as to whether a python library with many tasks would be best, or if I should be registering everything and using

fetch_task

... I'm leaning towards a library because then we get type hints which I forsee as solving some input/output headaches... Edit: I take it back... I see the utility of registering repos ( disconnected python env requirements )... lack of python typing does seem sad though....

thankful-minister-83577

09/19/2023, 10:43 PM

are there conflicts in dependencies? or just too many dependencies?

sparse-advantage-22780

09/19/2023, 10:46 PM

working with a very complex (to put it nicely) codebase... maybe wf-A requires torch=XX while wf-B requires torch=YY (except so much worse than that...)

sparse-advantage-22780

09/19/2023, 10:48 PM

a combination of both I suppose... It seems like I'm going to want to re-use tasks, but maybe I shouldn't think like that, and I should only strive to re-use workflows

thankful-minister-83577

09/19/2023, 10:48 PM

you can use reference tasks which are kinda like fetched tasks, but those are annoying because you have to manually specify the python type signature.

sparse-advantage-22780

09/19/2023, 10:49 PM

yes I noticed that... not ideal : /

thankful-minister-83577

09/19/2023, 10:49 PM

in the past we definitely talked about including some indication of the python type signature in the fetched task but that kinda breaks the type boundary… also no guarantee that things work (type X fetched from a task doesn’t necessarily correspond to type X in your codebase)

thankful-minister-83577

09/19/2023, 10:50 PM

not sure if there’s a good solution

thankful-minister-83577

09/19/2023, 10:50 PM

but some of this headache is just endemic to large projects in general with conflicting dependencies

freezing-airport-6809

09/20/2023, 5:01 AM

@sparse-advantage-22780 thank you for connecting and the questions. IMO, large software projects become complex over time. There is no one right answer. Here are the reasons we support a few things 1. Multiple registrations in single repo You want to register many workflows, tasks and manage everything in one place. But separate these across teams. This is why we also have a concept of project and domains 2. Multi image support in one workflow You can build different images and use them within a workflow. Imagespec actually makes this ever simpler 3. Reference tasks and workflows In cases you want to share tasks and workflows across teams or as a common repository, without having to add all dependencies to every project. More like RPC services

freezing-airport-6809

09/20/2023, 5:02 AM

Also

another question, Are there any examples of repos with many workflows that have overlapping tasks, and/or combined workflows?

Flytesnacks is an example. Massive repo, lots of workflows. Not overlaping though. But we do have these examples at all the large companies.

sparse-advantage-22780

09/20/2023, 2:54 PM

So right now

Copy code

if sklearn_image_spec.is_container():

works if the types are contained to within the task that the container is used in. But what about if I want a type that is specific to that container to be a return type? what would you suggest the type-hinting be?

freezing-airport-6809

09/21/2023, 4:03 AM

@sparse-advantage-22780 then the best method is to separate out the file. I had an example for this, let me ask @tall-lock-23197 if she turned it into a flytesnacks example?

freezing-airport-6809

09/21/2023, 4:04 AM

@sparse-advantage-22780 the best way is to keep a structure, where every file is bound to a container and you have one file which houses the workflow

tall-lock-23197

09/21/2023, 4:58 AM

I haven't worked on an example yet. @freezing-airport-6809 should the task be defined in the

is_container()

block?

freezing-airport-6809

09/21/2023, 5:00 AM

no let me share with you again

👍 1

freezing-airport-6809

09/21/2023, 5:02 AM

this example

Copy code

multi/
├── py_task.py
├── spark_task.py
└── wf.py

freezing-airport-6809

09/23/2023, 2:12 PM

Cc @powerful-gold-59386

👀 1

sparse-advantage-22780

10/04/2023, 4:29 PM

@freezing-airport-6809 might you be able to share the example directly with me? When the 'combo' workflow is being parsed

flytekit/core/interface.py

seems to be grabbing all tasks's type hints via

type_hints = get_type_hints(fn, include_extras=True)

but this fails because my

Foo

class isn't available in the

combo-wf

environment.

sparse-advantage-22780

10/04/2023, 6:53 PM

I created a repo to explain my thinking here... https://github.com/danpf/flyte-eval/tree/master/attempt_03/src/python I think it would be incredibly valuable for companies that are focused heavily on research (like biotech) to come up with some way for workflows/dynamic workflows to be either registered on the fly, or be parsed in separate environments (maybe as an arg to the decorator?) because it would really really speed up a scientist's time if they can quickly poke and prod at all aspects of a pipeline, not just one at a time.

sparse-advantage-22780

10/04/2023, 7:02 PM

i'm also happy to contribute this, if you think it can be done/aligns with the project's goals

freezing-airport-6809

10/05/2023, 4:05 AM

@sparse-advantage-22780 sorry i am looking at your example, is this the multi container example

tall-lock-23197

10/05/2023, 6:55 AM

@sparse-advantage-22780, could you please explain why you used a dynamic workflow? Since you want to call a bunch of workflows, you can just create a parent workflow and call all your sub-workflows from within it.

sparse-advantage-22780

10/05/2023, 8:56 AM

@freezing-airport-6809 yes, or having multiple subworkflows w/ separate environments + avoiding having to used registered workflows + having type hinting for development. @tall-lock-23197 because workflows(or subworkflows) just return an error, and dynamic workflows return a stacktrace either way, the errors were the same (name 'Foo' is not defined)

tall-lock-23197

10/05/2023, 9:26 AM

Can you do a relative import instead of an absolute import?

sparse-advantage-22780

10/05/2023, 2:38 PM

for the

one_off_combo...

script? I'd prefer not to, would like to allow scientists to work outside of the git-repo if they choose to (most do)

sparse-advantage-22780

10/05/2023, 2:39 PM

what would relative/absolute import change? The error i'm getting is w/r/t the parsing of the task annotations, which I doubt would change

tall-lock-23197

10/05/2023, 4:37 PM

Isn't

name 'Foo' is not defined

the error?

tall-lock-23197

10/05/2023, 4:39 PM

I believe

Foo

is the output type of your task, correct? And that isn't being found by Flyte.

sparse-advantage-22780

10/05/2023, 4:47 PM

yes

tall-lock-23197

10/06/2023, 4:34 AM

Right. In that case, the error has got to do with the imports.

9 Views

Open in Slack

Previous Next