Hi team, we are trying to compute the overhead time of the platform. Meaning:
1 - (All workflows transition time / all workflows duration)
We use this PromQL query:
1 - 
  ( flyte:propeller:all:workflow:completion_latency_unlabeled_ms_sum
    + flyte:propeller:all:node:transition_latency_unlabeled_ms_sum
    + flyte:propeller:all:node:queueing_latency_unlabeled_ms_sum
    + flyte:propeller:all:workflow:acceptance_latency_unlabeled_ms_sum)
  / (flyte:propeller:all:workflow:failure_duration_unlabeled_ms_sum + flyte:propeller:all:workflow:success_duration_unlabeled_ms_sum)
The numbers we get seem a bit off ( around 50%). Is the PromQL query correct? What do you recommend we use to measure that overhead time ?
@Julien Bisconti overhead is always compared to the total runtime and that depends on the workload
Also happy to connect and help optimize knobs
yes please. We are struggling to understand the numbers we see
What is the metrics for total runtime of a workflow ?
Should we do a video call ? Are you available ? https://meet.google.com/xxh-cbzt-kkd @Ketan (kumare3)
We can, a little too early for me right now
Cc @Dan Rammer (hamersaw) if you are around
But it's a holiday in us today
Let me dm