Hi all, I'm venturing into deploying Flyte core de...
# flyte-deployment
c
Hi all, I'm venturing into deploying Flyte core deployment on our cloud (Oracle). To begin with, I'm playing around with a local Kubernetes cluster. I noticed there are "AntiAffinity" rules for scheduling the different pods. This has two problems: 1. It means I cannot run more than 1 replica. 2. Upgrade doesn't work! because the new pod won't be scheduled before the existing pod is terminated... so it's a dead lock. What am I missing?
a
1. It means I cannot run more than 1 replica.
That the depends on the topologyKey used. In this case is at the node level, so this is true per worker node, but not at the cluster level where the scheduler will spread out the Deployment replicas
1. Upgrade doesn't work! because the new pod won't be scheduled before the existing pod is terminated... so it's a dead lock.
This can be also the case, especially depending on cluster size and available resources. The flyteadmin Deployment, for example, doesn't set a rollout
strategy
so it uses the default parameters (
MaxSurge: 25%
,
MaxUnavailable: 1
). Maybe making this configurable would help?
c
thank you @average-finland-92144! Why would you block putting more than one admin per k8s node? what's the rationale? Just for the sake of spreading the instances, or is there another reason? If you have a single k8s node (which isn't that far-fetched), you can only upgrade (given the anti-affinity) if MaxUnavailable==0, right? which isn't ideal...
a
Just for the sake of spreading the instances, or is there another reason?
It's about resiliency yes, there's not really and admin-specific conflict on having them colocated
If you have a single k8s node (which isn't that far-fetched), you can only upgrade (given the anti-affinity) if MaxUnavailable==0, right? which isn't ideal.
I think maxUnavailable: 0 would block you even further. If you have single k8s node, you'd be better off with flyte-binary, which uses a much simpler `recreate`strategy, with the drawback of causing temporary downtime during upgrades
c
oh right about the maxUnavailable - got confused well, recreate is problematic in a real production envrionment because, as you said, there's downtime. what about the other components? specifically propeller? what do i need to configure to have more than one instance (web hook as well)?
a
hey, just returning from vacations. so, to scale out propeller you can use its manager feature