:wave: Hi everyone, I have a question about metri...
# ask-the-community
j
👋 Hi everyone, I have a question about metrics produced by Flyte 📈 I am trying to debug why all the Workflow stats panels are showing 0 even though we had successful / failure runs in the time selected (images attached). I have noticed that
flyte:propeller:all:workflow:success_duration_ms_count
comes with a unique
exec_id
which is not allowing the metric to accumulate the counts properly (always generates a new one with value 1). Therefore, when
rate
overtime it produces 0 all the time. Has anyone else found the same issue ? am I doing something wrong or is there a way to configure flyte-binary to stop adding
exec_id
? Context: • Flyte-binary version 1.9.1
s
Hi @Thomas Newton! Do you have any immediate insights since you worked on a PR for fixing the Grafana dashboards?
j
I found that
sum(rate(flyte:propeller:all:workflow:event_recording:success_duration_ms_count[5m])) by (wf)
would do what I wanted at the end. seems like
flyte:propeller:all:workflow:event_recording:success_duration_ms_count
is able to accumulate the counts correctly
Even with unique
exec_id
. My previous assumption of
exec_id
being the issue was wrong. Therefore, must be something else causing the problem
t
@Jose Navarro and I did discuss this briefly in DMs. It seems like Jose's issue is actually about how the metrics are collected rather than the grafana dashboards. Potentially that is related to flytepropeller. Jose's metrics are separated by the unique
ecec_id
. But it looks like my metrics don't. I'm using
flyte-core
with flytepropeller
v1.10.4
.
j
and I installed
flyte-binary
1.9.1
Jose's metrics are separated by the unique ecec_id .
My original though of that being the problem might be wrong, since
flyte:propeller:all:workflow:event_recording:success_duration_ms_count
has also unique
exec_id
but it is still aggregated (screenshot above)
d
Just a note, you can control the metric labels that are included in Flyte exported prometheus metrics. This can be useful to reduce cardinality; which, when extremely large, can decrease performance.
j
@Dan Rammer (hamersaw) that’d be useful. I will have a dig in the documentation
d
It should just be the
metrics-keys
option at the top-level of propeller configuration and I believe the possible values are defined here.