Dan Rammer (hamersaw)

11/07/2022, 11:27 PM
Hey All, as discussed in our OSS Sync last week we're working on improving performance observability of Flyte components. I drafted an RFC on adding runtime metrics, offering overhead estimates, and orchestration metrics, using opentelemetry traces, to improve performance analysis in both production environments and during feature benchmarking. Please take a look at the RFC draft (with a formatted markdown version here) and leave any comments / questions / concerns. We are very excited to hear from the greater community and incorporate feedback! Also, trying a simple github PR formatted in markdown for this, we are always refining our RFC process - thoughts also welcome regarding this.

Guillaume Perchais

11/12/2022, 8:29 AM
Really nice proposal @Dan Rammer (hamersaw) ! We have tried to nail down this (Flyte overhead) with the existing metrics but didn’t get meaningful results (as you explain in the Problem section) so really happy to see this happening!