Hey All, as discussed in our OSS Sync last week we're working on improving performance observability of Flyte components. I drafted an RFC on adding runtime metrics, offering overhead estimates, and orchestration metrics, using opentelemetry traces, to improve performance analysis in both production environments and during feature benchmarking. Please take a look at the
RFC draft (with a
formatted markdown version here) and leave any comments / questions / concerns. We are very excited to hear from the greater community and incorporate feedback! Also, trying a simple github PR formatted in markdown for this, we are always refining our RFC process - thoughts also welcome regarding this.