Hi team we are trying setup a multicluster setup of flyte an Flyte #flyte-support

Hi team, we are trying setup a multicluster setup ...

hundreds-baker-75079

04/01/2024, 7:29 PM

Hi team, we are trying setup a multicluster setup of flyte and we are wondering if it was okay to setup an instance of datacatalog per cluster? Would there be any concerns, apart from some cache not being shared between cluster?

flat-area-42876

04/01/2024, 7:42 PM

> Would there be any concerns, apart from some cache not being shared between cluster? Retrieving stale data on cache hits would be the biggest concern and cache misses to a lesser extent. Would a given workflow be running in multiple clusters? If you were able to have each workflow run in a dedicated cluster that could avoid those issues. ~~I don't believe setting up a shared datacatalog instance across multiple clusters is supported out of the box in open source.~~

hundreds-baker-75079

04/01/2024, 7:47 PM

ah i see. so currently it is expected to have one datacatalog/cluster? the multicluster setup documentation didn't specify anything about setting up a datacatalog per cluster, so i assumed it was shared

flat-area-42876

04/01/2024, 7:50 PM

wait apologies for the confusion/my mistake. In a multicluster setup you're referring to utilizing a single control plane with multiple dataplanes (flytepropeller) right?

hundreds-baker-75079

04/01/2024, 7:50 PM

yes

flat-area-42876

04/01/2024, 7:57 PM

datacatalog is a stateless wrapper over the same postgres instance that stores state for executions of which the control plane uses as a source of truth. All the datacatalog instances would point to the same database so there isn't a concern for weird cache behavior. Let me look into the multi-cluster deployment really quickly to confirm some things.

hundreds-baker-75079

04/01/2024, 7:58 PM

thanks a lot for the input! we were wondering the same thing and just wanted to make sure we don't miss anything

flat-area-42876

04/01/2024, 8:15 PM

if it was okay to setup an instance of datacatalog per cluster?

you shouldn't need to spin up a datacatalog instance per cluster as it's separate from execution/propeller. Datacatalog is a part of the control plane similar to Flyteadmin. Setting up datacatalog with replicas would probably be a better way to approach/view this.

hundreds-baker-75079

04/04/2024, 8:49 PM

Thanks a lot for the help! Let me take a look

12 Views

Open in Slack

Previous Next