#3272 [Core feature] Observability: expose Flyte runtime metrics
Issue created by
cosmicBboy
Motivation: Why do you think this is important?
Flyte performance is becoming increasingly important as users scale executions. Currently, Flyte exposes very elementary metrcis (ex. "started", "updated") without any description. This makes analysis of workflow execution performance very difficult. As a result, identifying system-level bottlenecks and implementing mitigations is extremely difficult.
Goal: What should the final outcome look like, ideally?
Flyte should expose fine-grained breakdowns of workflow, node, and task execution details. These should attribute the time spent during execution to various categories (ex. "flyte overhead", "plugin overhead", "user code", etc).
Describe alternatives you've considered
Discussion on the RFC link below may contain more information on alternative approaches.
Propose: Link/Inline OR Additional context
This work is described as "runtime metrics" in the
performance observability RFC.
PRs:
•
flyteorg/flyteadmin#524
•
flyteorg/flyteidl#367
•
flyteorg/flytekit#1513
•
flyteorg/flytepropeller#529
•
flyteorg/flyteplugins#307
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte