<#3304 [Core feature] Integrate open telemetry int...
# flyte-github
a
#3304 [Core feature] Integrate open telemetry into Flyte components Issue created by hamersaw Motivation: Why do you think this is important? OpenTelemetry is a distributed tracing framework designed to ease performance analyses in distributed systems. Inline with our performance observability push, this would provide users a more conclusive understanding of Flyte performance. Additionally, it helps debug performance issues and serves as a benchmarking utility for new features. Goal: What should the final outcome look like, ideally? OpenTelmetry offers many opportunities for instrumentation. We hope to add support for: • grpc connections (ex. FlyteAdmin, datacatalog, FlytePropeller, etc) • blobstore I/O • k8s API server operations • many more Describe alternatives you've considered We have considered two main options: (1) Leaving this as they are: The current state may leave users (or developers) frustrated about system performance with no real explanation. (2) Enhancing prometheus metrics: Flyte currently exposes many metrics through prometheus, however these metics are often aggregations where fine-grained analysis at the workflow / node / or task level is unavailable. Propose: Link/Inline OR Additional context This work is described as "orchestration metrics" in the performance observability RFC. Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte