how do folks handle multiple sets of conflicting dependencie Flyte #flyte-support

how do folks handle multiple sets of conflicting d...

curved-whale-1505

05/03/2025, 3:00 PM

how do folks handle multiple sets of conflicting dependencies in a monorepo of flyte workflows and tasks? I want to have some shared set of reusable building blocks that relies on a shared set of dependencies (let’s say this just relies on boto3 and nothing else) and then give some way to allow a given workflow to have its own set of additional dependencies (let’s say a specific version of numpy). other workflows may conflict and want a different version of numpy when fast registration occurs, I want both the code of the shared reusable building blocks and the target workflow to end up in s3

curved-whale-1505

05/03/2025, 3:09 PM

reading the uv docs it sounds like I want essentially separate python packages with separate pyproject.toml files and then a shared dependency using

tool.uv.sources

the only part I don’t understand is how I can get the the contents of my shared python package in the monorepo also in flyte fast registration instead of installing it into the docker image

freezing-airport-6809

05/04/2025, 1:39 AM

@curved-whale-1505 you can separate them into multiple registrations. and then use

reference tasks

freezing-airport-6809

05/04/2025, 1:39 AM

this is a way to completely decouple if you wanted

high-park-82026

05/04/2025, 7:05 PM

@curved-whale-1505, to expand on that, there are a few different ways: 1. You can use separate ImageSpecs with different dependencies and annotate the task with that

@task(container_image=my_image_1)

and the system will automatically use the right image with the right dependencies. If you do this, you don't have to create separate workflows as Flyte allows you to mix images in the same workflow 2. You can use different workflows/tasks and reference them using Reference Tasks/Workflows as Ketan pointed out 3. We have recently added a feature called

with_runtime_packages

(Docs will be part of the upcoming official flyte release) that allows you to use

uv run

to install packages on the fly before a task run

Copy code

@task(container_image=img.with_runtime_packages(["numpy==..."]))
def my_task()...

Note option #3 does incur runtime penalty as it'll install the packages on the fly every time this task runs.

curved-whale-1505

05/04/2025, 10:06 PM

So I was hoping to have each ImageSpec point at a local

uv.lock

file and then ensure I can always run

pyflyte run ...

without

--remote

to replicate the entire workflow locally. Let’s say I have the following structure (pseudocode):

Copy code

monorepo/
├── common_utils
│   ├── pyproject.toml
│   └── uv.lock
├── project_a
│   ├── pyproject.toml
│   └── uv.lock
├── project_b
│   ├── pyproject.toml
│   └── uv.lock

In this example, the dependencies of

common_utils

is a strict subset of

project_a

and

project_b

. And

project_a

and

project_b

have conflicting dependencies (lets stick with conflicting numpy as per the example before). Now ideally I want want: 1) when I run

project_a

, the code for both

common_utils

and

project_a

to be fast registered into s3 2) the ability to run

pyflyte run ...

without

--remote

to replicate the entire workflow(s) locally within

project_a

and

project_b

curved-whale-1505

05/04/2025, 10:06 PM

Back to your suggestions: 1. I understand I don’t need to create separate workflows and I can use custom ImageSpec per task, but I am struggling to understand how I can keep local and remote parity since one of the big bennefits of using flyte over other tools is the ability to run the entire workflow locally. This is why I’m hoping to keep ImageSpec tied to a local

uv.lock

file 2. Reading the docs for reference tasks, it seems like I won’t be able to run the entire workflow locally anymore? Please correct me if I’m wrong / these docs are out of date. 3. Interesting, does that work locally too?

curved-whale-1505

05/04/2025, 10:12 PM

To add a little more color, the context here is that I want people to have free reign and create any workflows/projects they want with any dependency closure of their choosing (whatever version of numpy, transformers, etc.) But at the same time I want to create a few rock solid common utilities that scientists can share to help with tasks like hit this API, get these results, transform it into this dataframe etc. In the context of a monorepo, hoping to figure out how I can allow flexibility of the dependency closure per project/workflow (i.e. separate venv) while maintaining the ability to run the entire workflow locally using the scientist’s custom tasks along with the shared common tasks.

jolly-knife-16921

05/05/2025, 1:56 PM

@curved-whale-1505 While I haven't done this yet (maybe in the next couple months), I'm planning to write some plugins for a monorepo build system (for me Pants) to help with this kind of stuff.

high-park-82026

05/10/2025, 5:30 PM

aha.. you are right, locally the assumption is you can have a single venv with the right packages installed... Is there an assumption in your example that there is a dependency between project_a and project_b (hence the conflicts)? Is it possible to structure things so that a workflow can independently run (can be unit tested) while having a separate project_integration that ties things together using any of the above methods... this way only that workflow won't run locally It's an interesting problem though, we are rethinking some of the execution engine fundamentals and may have a better solution cooking 🍳 (stay tuned 🙂)

34 Views

Open in Slack

Previous Next