Hi! I saw on the website that experiment tracking ...
# ecosystem-unionml
m
Hi! I saw on the website that experiment tracking and data lineage is on the way. Is there anywhere this progress can be tracked? I’m working at a company that’s just making a switch from determined.ai to Flyte for orchestration. However, this now means we are missing three important pieces which are experiment tracking/model registry/data versioning. Wondering if union.ml could be a good fit if it’s not too far away on the horizon?
k
Hi Martin welcome to the community
This is probably not what you are looking for right now - but Flyte has flytedecks, a viz that can be attached to every task
For unionml the issueboatd is on the repo and the community meets every 2 weeks
Cc @Niels Bantilan , we would love to catch up and understand the experience you expect. This could be great
Also determined and Flyte seem complimentary in many aspects
m
Hi @Ketan (kumare3)! Thanks for the quick reply. I was referring to this from your website.
As for determined.ai, would love to hear your thoughts on how they can complement each other. Right now, we are triggering training jobs in Determined from Flyte workflows. The model registry lives in determined and things work well. But it seems a bit fragmented to me. We could just as well scale up GPU nodes with Flyte and have model registry in something more lightweight like wandb or mlflow.
k
So we are working with wandb and hopefully mlflow too
Some people have used Flyte with mlflow
We will try to make a tutorial or put together some resources
❤️ 2
n
hi Martin! So for unionml we’re planning to build integrations for MLFlow for experiment tracking and potetially model registry (although Flyte itself is basically a model registry), you can check out the roadmap here: https://github.com/orgs/unionai-oss/projects/1/views/4
❤️ 1
three important pieces which are experiment tracking/model registry/data versioning.
I think for experiment tracking we’re planning on relying on integrations with other libraries in the ecosystem (e.g. mlflow) Re: model registry and data versioning and data lineage, what are your requirements? Flyte basically tracks all artifacts at the interface of tasks and workflow (in addition to all dependencies in the execution graph), including models and data, but would be interested in chatting to understand what your needs are.
y
yeah we have an ongoing project to send events to datahub, but not sure if that would be sufficient for your usecase. would like to learn more about it.
cc @Fredrik who’s helping out with that project
f
Fwiw, we use mlflow to track experiment metrics as well. Datahub operates at a different abstraction level. It allows us to link tasks to datasets (tables for example) and models (on a ”app level”, not individual model versions).
m
I think for experiment tracking we’re planning on relying on integrations with other libraries in the ecosystem (e.g. mlflow)
That makes sense @Niels Bantilan. For the tracking, we mostly need rudimentary features (eg graphs). For the model registry, we want a centralized thin abstraction layer in front of a blob storage (eg s3) that can reference arbitrary models and metadata. MLFlow suits our needs pretty well here. When it comes to data lineage, we’re currently using DVC but I’m currently looking for alternatives as using git to track revisions isn’t great when domain experts need to be able to manipulate the datasets (fix incorrect labels etc).
y
@Martin Hwasser do you think we could have a separate quick chat just about data versioning?
i’ve been looking into dvc recently, but passed on it since i didn’t think it supported enough.
m
Sure, maybe tomorrow? I’m at CET, when’s a good time for you?
y
would love to hear how you’re currently using it and any other usecases you might have.
tomorrow is great, 1pm/3pm? (i’m in seattle)
m
1pm Seattle time is a bit late for me in Stockholm
y
ooh cet
not ct
8am pt/5pm cet? i think you guys are 9 hours ahead.
m
That works!
y
sweeet, let me send an invite
f
Yee is getting used to those early morning meets? 😅 (I’m in helsinki myself)
y
not sure if you want to join niels/fredrik, but yeah really want to understand the data versioning story here
maybe we can set up something to discuss model performance also?
f
Can’t tomorrow, unfortunately, but I trust you’ll take good notes 👍
y
sweet will do.
256 Views