how do folks handle multiple sets of conflicting d...
# flyte-support
c
how do folks handle multiple sets of conflicting dependencies in a monorepo of flyte workflows and tasks? I want to have some shared set of reusable building blocks that relies on a shared set of dependencies (let’s say this just relies on boto3 and nothing else) and then give some way to allow a given workflow to have its own set of additional dependencies (let’s say a specific version of numpy). other workflows may conflict and want a different version of numpy when fast registration occurs, I want both the code of the shared reusable building blocks and the target workflow to end up in s3
reading the uv docs it sounds like I want essentially separate python packages with separate pyproject.toml files and then a shared dependency using
tool.uv.sources
the only part I don’t understand is how I can get the the contents of my shared python package in the monorepo also in flyte fast registration instead of installing it into the docker image
f
@curved-whale-1505 you can separate them into multiple registrations. and then use
reference tasks
this is a way to completely decouple if you wanted
h
@curved-whale-1505, to expand on that, there are a few different ways: 1. You can use separate ImageSpecs with different dependencies and annotate the task with that
@task(container_image=my_image_1)
and the system will automatically use the right image with the right dependencies. If you do this, you don't have to create separate workflows as Flyte allows you to mix images in the same workflow 2. You can use different workflows/tasks and reference them using Reference Tasks/Workflows as Ketan pointed out 3. We have recently added a feature called
with_runtime_packages
(Docs will be part of the upcoming official flyte release) that allows you to use
uv run
to install packages on the fly before a task run
Copy code
@task(container_image=img.with_runtime_packages(["numpy==..."]))
def my_task()...
Note option #3 does incur runtime penalty as it'll install the packages on the fly every time this task runs.
c
So I was hoping to have each ImageSpec point at a local
uv.lock
file and then ensure I can always run
pyflyte run ...
without
--remote
to replicate the entire workflow locally. Let’s say I have the following structure (pseudocode):
Copy code
monorepo/
├── common_utils
│   ├── pyproject.toml
│   └── uv.lock
├── project_a
│   ├── pyproject.toml
│   └── uv.lock
├── project_b
│   ├── pyproject.toml
│   └── uv.lock
In this example, the dependencies of
common_utils
is a strict subset of
project_a
and
project_b
. And
project_a
and
project_b
have conflicting dependencies (lets stick with conflicting numpy as per the example before). Now ideally I want want: 1) when I run
project_a
, the code for both
common_utils
and
project_a
to be fast registered into s3 2) the ability to run
pyflyte run ...
without
--remote
to replicate the entire workflow(s) locally within
project_a
and
project_b
Back to your suggestions: 1. I understand I don’t need to create separate workflows and I can use custom ImageSpec per task, but I am struggling to understand how I can keep local and remote parity since one of the big bennefits of using flyte over other tools is the ability to run the entire workflow locally. This is why I’m hoping to keep ImageSpec tied to a local
uv.lock
file 2. Reading the docs for reference tasks, it seems like I won’t be able to run the entire workflow locally anymore? Please correct me if I’m wrong / these docs are out of date. 3. Interesting, does that work locally too?
To add a little more color, the context here is that I want people to have free reign and create any workflows/projects they want with any dependency closure of their choosing (whatever version of numpy, transformers, etc.) But at the same time I want to create a few rock solid common utilities that scientists can share to help with tasks like hit this API, get these results, transform it into this dataframe etc. In the context of a monorepo, hoping to figure out how I can allow flexibility of the dependency closure per project/workflow (i.e. separate venv) while maintaining the ability to run the entire workflow locally using the scientist’s custom tasks along with the shared common tasks.
j
@curved-whale-1505 While I haven't done this yet (maybe in the next couple months), I'm planning to write some plugins for a monorepo build system (for me Pants) to help with this kind of stuff.
h
aha.. you are right, locally the assumption is you can have a single venv with the right packages installed... Is there an assumption in your example that there is a dependency between project_a and project_b (hence the conflicts)? Is it possible to structure things so that a workflow can independently run (can be unit tested) while having a separate project_integration that ties things together using any of the above methods... this way only that workflow won't run locally It's an interesting problem though, we are rethinking some of the execution engine fundamentals and may have a better solution cooking 🍳 (stay tuned 🙂)