Looking for some eyes on this PR from some of the ...
# contribute
c
Looking for some eyes on this PR from some of the Union folks: https://github.com/flyteorg/flyte/pull/6521 I think there is bug in propeller that leads to significant amounts of wasted/redundant workflow store updates (~60% of writes at our scale). Depending on how folks use Flyte this could significantly reduce k8s API calls which is a factor in scaling Flyte.
❤️ 1
We see some promising results in non-prod. We're soaking it over the weekend and will land it in production next week.
a
cc @flat-area-42876
g
Wow, nice find!
❤️ 1
c
Deployed it to production and looking good so far. We've noticed that etcD operations are also about halved.
f
Thank you for finding this and testing it out. I'll take a deeper look at this later today. I believe the initial intention of isDirty was to prevent a node from being evaluated multiple times within the same eval loop. However, we did just do something similar in flyte v2 to reduce writes. This seems like a promising opportunity to improve scalability
👍 1
👍🏽 1
c
With this change we were just able to pretty easily hit 10k concurrent workflows on a single propeller node. You can see the etcD load is minimal as well as the k8s API traffic for workflows. (some of the graphs are in UTC and others in PST but its all the same time frame i promise lol). Oh and 0 redundant workflow updates 🙂