https://flyte.org logo
Title
h

Hank Fanchiu

12/08/2022, 4:25 PM
👋 what kubernetes solution do folks use? anyone use k3s in production? at stripe, we’ve prototyped flyte with k3s but are now evaluating different solutions. curious to learn about your experiences!
k

Ketan (kumare3)

12/08/2022, 4:28 PM
Cc @Michael Lujan ?
f

Felix Ruess

12/08/2022, 4:32 PM
I'm running it on k3s with metallb, traefik and longhorn... and don't see any reason to move to a different one.
d

David Espejo (he/him)

01/19/2023, 4:45 PM
👀 Bumping up @Hank Fanchiu's question ⬆️
f

Felix Ruess

01/19/2023, 4:50 PM
k3s runs fine here so far, any specific questions @David Espejo (he/him)?
d

David Espejo (he/him)

01/19/2023, 4:55 PM
thanks for your input @Felix Ruess! Not much into the specifics, but more interested in having a sense of what other orgs are using to run Flyte/ML workloads? From what I can tell so far EKS/GKE/AKS (in that order) seem like common options, and k3s also makes a ton of sense
f

Felix Ruess

01/19/2023, 5:08 PM
sure So if it's of interest: we run 3 master nodes in HA setup in VMs on our Proxmox cluster and the agent nodes (with various GPUs) are "bare-metal" machines. Kube-vip for control plane and metallb for other loadbalancer services. Longhorn for persistent storage (e.g. for Flyte postgres db), actual ML data in our MinIO cluster...
and Grafana agent ships metrics and logs to Prometheus and Loki, viewing Flyte task logs via Grafana/Loki works nicely
k

Ketan (kumare3)

01/19/2023, 8:11 PM
@Felix Ruess would you be open to sharing with the community your progress with Flyte and share thr journey during a community meeting
f

Felix Ruess

01/19/2023, 8:22 PM
sure, but to be honest we can't run it in prod yet and run most things outside flyte still (e.g. because I can't assign affinity per task yet)
And it seems that we are a bit atypical Flyte users, with only a few container tasks in a workflow so far. Not using most of the "fancy" Flyte features... But sure, I would be up to talking about it...
d

David Espejo (he/him)

01/19/2023, 8:29 PM
that's great @Felix Ruess! I'm sure we all can learn from your journey so far. @Tyler Su 👀
k

Ketan (kumare3)

01/19/2023, 11:27 PM
the affinity per task is coming soon