#3304 [Core feature] Integrate open telemetry into Flyte components
Issue created by
hamersaw
Motivation: Why do you think this is important?
OpenTelemetry is a distributed tracing framework designed to ease performance analyses in distributed systems. Inline with our performance observability push, this would provide users a more conclusive understanding of Flyte performance. Additionally, it helps debug performance issues and serves as a benchmarking utility for new features.
Goal: What should the final outcome look like, ideally?
OpenTelmetry offers many opportunities for instrumentation. We hope to add support for:
• grpc connections (ex. FlyteAdmin, datacatalog, FlytePropeller, etc)
• blobstore I/O
• k8s API server operations
• many more
Describe alternatives you've considered
We have considered two main options:
(1) Leaving this as they are: The current state may leave users (or developers) frustrated about system performance with no real explanation.
(2) Enhancing prometheus metrics: Flyte currently exposes many metrics through prometheus, however these metics are often aggregations where fine-grained analysis at the workflow / node / or task level is unavailable.
Propose: Link/Inline OR Additional context
This work is described as "orchestration metrics" in the
performance observability RFC.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte