two questions around deploying flyte. 1. Is it saf...
# ask-the-community
y
two questions around deploying flyte. 1. Is it safe to do a rolling restart of flyte propeller and/or flyte admin during running workflows? are there any bad states that the system can get into? 2. What are the suggested kube cpu and memory request for admin and propeller for a medium size cluster (500 concurrent workflows)? Thanks!
k
Yes it is safe absolutely- you will see some Back offs but everything should resume correctly and will not interrupt long running jobs
500 is low - propeller is memory hungry and admin also like slightly higher memory
Admin 2gb and propeller 4gb should Keep you nice for a while maybe double that load
y
Thanks for the quick response. what does admin store inside of memory? I assume propeller has some sort of cache for input&outputs.
any pointers regarding CPU consumption? are they mainly doing I/O?
k
Yes - mainly io, all written in golang so optimized for high io
Check propeller perf docs
y
got it. So, the 2GB and 4GB memory suggestions are for running stable at 500 concurrent workflows? would these numbers roughly double for 1000 concurrent workflows? I assume other things like apiserver rate limits would come into play here as well
d
@Yi Sheng Ong propeller (Flyte's execution engine) uses
goroutines
to handle workflow executions. Each goroutine takes 2kB of memory (source), so 500 concurrent workflows would give you about 1GB of memory. As you mentioned, though, there are other sources of overhead, including the KubeAPI rate limits, the potential lags from the Informer cache and more. The current performance docs have some guidance: https://docs.flyte.org/en/latest/deployment/configuration/performance.html#optimizing-round-latency I'm exploring how all the pieces fit together and hopefully that page will receive a revamp soon 🙂