Hi team, we are trying setup a multicluster setup ...
# ask-the-community
v
Hi team, we are trying setup a multicluster setup of flyte and we are wondering if it was okay to setup an instance of datacatalog per cluster? Would there be any concerns, apart from some cache not being shared between cluster?
p
> Would there be any concerns, apart from some cache not being shared between cluster? Retrieving stale data on cache hits would be the biggest concern and cache misses to a lesser extent. Would a given workflow be running in multiple clusters? If you were able to have each workflow run in a dedicated cluster that could avoid those issues. I don't believe setting up a shared datacatalog instance across multiple clusters is supported out of the box in open source.
v
ah i see. so currently it is expected to have one datacatalog/cluster? the multicluster setup documentation didn't specify anything about setting up a datacatalog per cluster, so i assumed it was shared
p
wait apologies for the confusion/my mistake. In a multicluster setup you're referring to utilizing a single control plane with multiple dataplanes (flytepropeller) right?
v
yes
p
datacatalog is a stateless wrapper over the same postgres instance that stores state for executions of which the control plane uses as a source of truth. All the datacatalog instances would point to the same database so there isn't a concern for weird cache behavior. Let me look into the multi-cluster deployment really quickly to confirm some things.
v
thanks a lot for the input! we were wondering the same thing and just wanted to make sure we don't miss anything
p
if it was okay to setup an instance of datacatalog per cluster?
you shouldn't need to spin up a datacatalog instance per cluster as it's separate from execution/propeller. Datacatalog is a part of the control plane similar to Flyteadmin. Setting up datacatalog with replicas would probably be a better way to approach/view this.
v
Thanks a lot for the help! Let me take a look