Hello all We are in the midst of evaluating open source ML p Flyte #flyte-support

Hello all, We are in the midst of evaluating open...

creamy-honey-22699

01/18/2023, 7:02 PM

Hello all, We are in the midst of evaluating open source ML pipeline and workflow orchestration tools w/in our organization. I received lots of great recommendations pointing me towards Flyte and am interested in doing a POC at some point. I'm still learning more and more about Flyte by watching some webinars. I did have some questions about multi-tenancy because that seems to be the biggest pain point when adopting new tools

creamy-honey-22699

01/18/2023, 7:03 PM

We have multiple Data Science teams, each with their own respective AWS accounts. Does Flyte encourage deploying multiple instances of the infrastructure in our stakeholders' accounts or does it work best by having a centralized instance in own designated Machine Learning Platform AWS account?

thankful-minister-83577

01/18/2023, 8:03 PM

hey @creamy-honey-22699 welcome!

thankful-minister-83577

01/18/2023, 8:05 PM

you can certainly deploy one per aws account if that’s what they already have. most companies only have one aws account where their ml platform and their ml workflows live.

thankful-minister-83577

01/18/2023, 8:06 PM

are these separate top level accounts or subaccounts?

thankful-minister-83577

01/18/2023, 8:07 PM

i don’t think i’ve actually come across this style of use-case before. the main thing to keep in mind is data and data transfer.

thankful-minister-83577

01/18/2023, 8:09 PM

i think the answer to this question will mostly depend on that. one of flyte’s advantages is that it has multi-tenancy features built in, so you don’t have to manage things yourself. but that said if you’re going to incur the price cost and time cost of a lot of data transfer as ml workloads pull and push large amounts of data, then that’s certainly something to consider as well.

creamy-honey-22699

01/18/2023, 8:34 PM

Thank you @thankful-minister-83577 for your helpful information! Yes it's a very unique paradigm. To answer your question, yes these are separate top level accounts, each with their own account signature (we call this an account moniker) and prod, staging and dev environments. However, data would typically live in a single account that we call the Data Lab. You do bring up a good point that each of these different accounts have their own billing so model training and compute should be within their respective accounts because we don't want to incur the cost for that. I'm wondering if it's possible to have the UI and server in our centralized account but somehow do the compute on the customer accounts using Flyte. Is there any documentation you can point me to that delves into multi-tenancy details more in depth and also deployment patterns?

thankful-minister-83577

01/18/2023, 8:42 PM

it is possible to have a multi-cluster setup yes. https://docs.flyte.org/en/latest/deployment/multicluster.html (btw, there’s an active pr out that moves some of these articles around, so the location will change in the next couple days)

thankful-minister-83577

01/18/2023, 8:43 PM

that article there guides the user through just that, how to run multiple control planes.

creamy-honey-22699

01/18/2023, 8:43 PM

I'll take a look, thank you. Appreciate the guidance. And multi-cluster allows you to spin up cross-account clusters?

thankful-minister-83577

01/18/2023, 8:43 PM

this paradigm is meant for very large scale deploys and is what we used at lyft. but this was only ever done in one account, not two

thankful-minister-83577

01/18/2023, 8:43 PM

so yeah not sure about that last question

thankful-minister-83577

01/18/2023, 8:44 PM

and keep in mind there is still data that flows from the data plane to the control plane and vice versa.

creamy-honey-22699

01/18/2023, 8:44 PM

It's tricky because these are essentially separate accounts with different IAM roles and such.

thankful-minister-83577

01/18/2023, 8:44 PM

rpc calls to the operator, event information, and what we call metadata input/output (primitives like strings and floats, so flyte can display them on the ui)

👍 1

thankful-minister-83577

01/18/2023, 8:45 PM

offloaded data (like dataframes and files) will remain in the target location but the address of that location will be sent back for instance.

thankful-minister-83577

01/18/2023, 8:46 PM

different deployments of propeller can have a different default location for this offloaded data. and this setting is also configurable at the project/domain/workflow level

thankful-minister-83577

01/18/2023, 8:46 PM

it is indeed tricky… can’t promise it’ll work, you’ll almost certainly have to adapt the helm chart

creamy-honey-22699

01/18/2023, 8:47 PM

Got it thanks - yeah that might be a lot of toil. It might be better to just go with the deploy 1 to N account(s) approach then. I'll keep this in mind though.

thankful-minister-83577

01/18/2023, 9:21 PM

let us know how it goes!

thankful-minister-83577

01/18/2023, 9:21 PM

happy to help debug anything that might arise from this approach

🙏 1

156 Views

Open in Slack

Previous Next