Scale issues with Flyte Binary:
Hi all, we are deploying Flyte Binary in Oracle Cloud Infra. We should probably switch to Flyte Core deployment but that's what it is for now...
We noticed few things:
1. The webhook service fails under load, and the 10 retries aren't enough.
2. Flyte containers restarts sometimes (healthcheck fails). I don't have a correlation between that and memory consumption but last time it failed Flyte container was used 25GB. The K8s node it was on still had memory available, and the limit I gave was 32GB. How can I diagnose why the health check failed? what should I look for?
3. Memory consumption: seems like mem consumption steadily increases. Could it have a memory leak or something?
4. Availability - is it possible to have more replicas for the Flyte binary? I read somewhere that it won't work. Can't seem to find the link. Can you share more details? is it correct? if so, why? and what can be done?
5. If the answer to #4 is only 1 replica, using Flyte Core changes things? can we have more replicas to increase Flyte availability esp. during re-deployment?
Thanks guys!