Hi I saw on the website that experiment tracking and data li Flyte #ecosystem-unionml

Hi! I saw on the website that experiment tracking ...

faint-monitor-96441

08/22/2022, 1:21 PM

Hi! I saw on the website that experiment tracking and data lineage is on the way. Is there anywhere this progress can be tracked? I’m working at a company that’s just making a switch from determined.ai to Flyte for orchestration. However, this now means we are missing three important pieces which are experiment tracking/model registry/data versioning. Wondering if union.ml could be a good fit if it’s not too far away on the horizon?

freezing-airport-6809

08/22/2022, 1:48 PM

Hi Martin welcome to the community

freezing-airport-6809

08/22/2022, 1:50 PM

This is probably not what you are looking for right now - but Flyte has flytedecks, a viz that can be attached to every task

freezing-airport-6809

08/22/2022, 1:50 PM

For unionml the issueboatd is on the repo and the community meets every 2 weeks

freezing-airport-6809

08/22/2022, 1:54 PM

Cc @broad-monitor-993 , we would love to catch up and understand the experience you expect. This could be great

freezing-airport-6809

08/22/2022, 1:56 PM

Also determined and Flyte seem complimentary in many aspects

faint-monitor-96441

08/22/2022, 2:19 PM

Hi @freezing-airport-6809! Thanks for the quick reply. I was referring to this from your website.

faint-monitor-96441

08/22/2022, 2:27 PM

As for determined.ai, would love to hear your thoughts on how they can complement each other. Right now, we are triggering training jobs in Determined from Flyte workflows. The model registry lives in determined and things work well. But it seems a bit fragmented to me. We could just as well scale up GPU nodes with Flyte and have model registry in something more lightweight like wandb or mlflow.

freezing-airport-6809

08/22/2022, 2:31 PM

So we are working with wandb and hopefully mlflow too

freezing-airport-6809

08/22/2022, 2:31 PM

Some people have used Flyte with mlflow

freezing-airport-6809

08/22/2022, 2:32 PM

We will try to make a tutorial or put together some resources

❤️ 2

broad-monitor-993

08/22/2022, 2:36 PM

hi Martin! So for unionml we’re planning to build integrations for MLFlow for experiment tracking and potetially model registry (although Flyte itself is basically a model registry), you can check out the roadmap here: https://github.com/orgs/unionai-oss/projects/1/views/4

❤️ 1

broad-monitor-993

08/22/2022, 2:36 PM

three important pieces which are experiment tracking/model registry/data versioning.

I think for experiment tracking we’re planning on relying on integrations with other libraries in the ecosystem (e.g. mlflow) Re: model registry and data versioning and data lineage, what are your requirements? Flyte basically tracks all artifacts at the interface of tasks and workflow (in addition to all dependencies in the execution graph), including models and data, but would be interested in chatting to understand what your needs are.

thankful-minister-83577

08/22/2022, 4:24 PM

yeah we have an ongoing project to send events to datahub, but not sure if that would be sufficient for your usecase. would like to learn more about it.

thankful-minister-83577

08/22/2022, 4:27 PM

cc @colossal-painter-70298 who’s helping out with that project

colossal-painter-70298

08/22/2022, 4:35 PM

Fwiw, we use mlflow to track experiment metrics as well. Datahub operates at a different abstraction level. It allows us to link tasks to datasets (tables for example) and models (on a ”app level”, not individual model versions).

faint-monitor-96441

08/22/2022, 6:03 PM

I think for experiment tracking we’re planning on relying on integrations with other libraries in the ecosystem (e.g. mlflow)

That makes sense @broad-monitor-993. For the tracking, we mostly need rudimentary features (eg graphs). For the model registry, we want a centralized thin abstraction layer in front of a blob storage (eg s3) that can reference arbitrary models and metadata. MLFlow suits our needs pretty well here. When it comes to data lineage, we’re currently using DVC but I’m currently looking for alternatives as using git to track revisions isn’t great when domain experts need to be able to manipulate the datasets (fix incorrect labels etc).

thankful-minister-83577

08/22/2022, 6:11 PM

@faint-monitor-96441 do you think we could have a separate quick chat just about data versioning?

thankful-minister-83577

08/22/2022, 6:12 PM

i’ve been looking into dvc recently, but passed on it since i didn’t think it supported enough.

faint-monitor-96441

08/22/2022, 6:12 PM

Sure, maybe tomorrow? I’m at CET, when’s a good time for you?

thankful-minister-83577

08/22/2022, 6:13 PM

would love to hear how you’re currently using it and any other usecases you might have.

thankful-minister-83577

08/22/2022, 6:13 PM

tomorrow is great, 1pm/3pm? (i’m in seattle)

faint-monitor-96441

08/22/2022, 6:16 PM

1pm Seattle time is a bit late for me in Stockholm

thankful-minister-83577

08/22/2022, 6:17 PM

ooh cet

thankful-minister-83577

08/22/2022, 6:17 PM

not ct

thankful-minister-83577

08/22/2022, 6:18 PM

8am pt/5pm cet? i think you guys are 9 hours ahead.

faint-monitor-96441

08/22/2022, 6:19 PM

That works!

thankful-minister-83577

08/22/2022, 6:19 PM

sweeet, let me send an invite

colossal-painter-70298

08/22/2022, 6:19 PM

Yee is getting used to those early morning meets? 😅 (I’m in helsinki myself)

thankful-minister-83577

08/22/2022, 6:20 PM

not sure if you want to join niels/fredrik, but yeah really want to understand the data versioning story here

thankful-minister-83577

08/22/2022, 6:20 PM

maybe we can set up something to discuss model performance also?

colossal-painter-70298

08/22/2022, 6:21 PM

Can’t tomorrow, unfortunately, but I trust you’ll take good notes 👍

thankful-minister-83577

08/22/2022, 6:21 PM

sweet will do.

327 Views

Open in Slack

Previous Next