Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Hi folks - doing a little research right now on various pipeline implementations (Kubeflow / Argo / Prefect amongst others). I'm digging into the details around multi-cluster setup / configuration and overall architecture. Hopefully I'm in the right place!

I was a little surprised to see so many k8s primitives exposed in the install guide -- <https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#control-plane-configuration>

Has there been any other feedback around that or intention to make the agent config any simpler? Are there similar guides for AKS, GKE and on-prem? (I should be able to sort out the infra differences, but prior art would save me time)

TIA!

Hi <@U065GFDUU4X> and welcome to the Flyte community

Multi-cluster is certainly the most advanced deployment option and it's true that currently it requires to deal with multiple K8s resources. There's already an <https://github.com/flyteorg/flyte/issues/3970|Issue> open in regards to enabling the `flyte-core` chart (the one used for multicluster) to consume pre-existing secrets, facilitating a bit more of  "separation of concerns" in that step.

This <https://github.com/unionai-oss/deploy-flyte|repo> is where we're starting to add reference implementations with TF modules (currently for EKS, GKE soon-to-be-merged).

Also there's a semi-manual deployment tutorial available <https://github.com/davidmirror-ops/flyte-the-hard-way|here>

For AKS we don't have a guide just yet, the team at <#C05315T4K5K|> has been active improving the compatibility and sharing knowledge, but we probably need now to collect this knowledge on a single place.

I hope any of this is useful for you

Thanks a bunch <@U04H6UUE78B> - that's very helpful.

Let me dig into that a bit more -- is there a good place to further discuss / post ideas or concerns about the infra architecture?

I think in my ideal world, the propeller would operate similarly to a self-hosted CI agent like you see in Circle CI, Azure DevOps or GitHub Actions -- that is, the agent uses a PAT to register itself as part of a work pool the server defines. The server never makes inbound connections to agents (it looks like comms are bi-directional in Flyte?) -- agents long poll the server to determine if they need to schedule work. Scheduling happens periodically as agents pick up jobs from the server based on worker metadata matching requests.

As for long-living posts, Github Discussions are probably better than Slack.
For sync discussions we also have the biweekly Contributors meetup where you can chat to maintainers.

In regards to Flyte architecture, FlyteAdmin submits the Workflow CR create/update operation to the K8s API and the flytepropeller controller watches and reconciles resources in K8s, reporting execution status back to flyteadmin. So from control plane-&gt;data plane is more an indirect path.

<https://docs.flyte.org/en/latest/concepts/component_architecture/flytepropeller_architecture.html#components|https://docs.flyte.org/en/latest/concepts/component_architecture/flytepropeller_architecture.html#components>

Great - thanks for that! I understand the model better now