Hi! I have Flyte deployed on a K8s cluster and I a...
# ask-the-community
Hi! I have Flyte deployed on a K8s cluster and I also have Prometheus set to scrape FlyteAdmin and FlytePropeller metrics. Thing is, I currently have 3 replicas of the FlytePropeller pod and so I have 3 values for each metric. I'm worried this might impact negatively the utility of these metric. As an example, I get 3 different workflow average durations, one for each pod, but I would like to know the average duration of all executions. Is there a way to solve this? For some metrics I think I could just sum them by the instance label (which is the pod), but I don't know if this will work everytime and it will also make the queries more complex.
A secondary point: For FlyteAdmin there is a Kubernetes service in my cluster, which is also scraped by Prometheus (so for each metric I get 4 time series, 1 from the service and other 3, from each pod). How do the pods organize themselves to send the metrics in the service endpoint? For instance, the metric flyteadminadmincreate executionerrors has a non-zero value only for 1 pod. But the same metric, when coming from the service, alternates in time between 0 and the non-zero value, making it look like the service /metrics endpoint offer the metrics from 1 pod at a time, alternating randomly between them.
For the second point, prometheus shouldn't scrape metrics from the service. "making it look like the service /metrics endpoint offer the metrics from 1 pod at a time" is exactly what's happening. The k8s Service exists to proxy requests to the FlyteAdmin pods. This is useful for the Flyte API but it produces nonsense metrics.
Ensure you're adding
Copy code
<http://prometheus.io/scrape|prometheus.io/scrape>: "true"
only to
and not to
. Having said that, I don't know why flyteadmin service has the
port to load-balance the metrics endpoint. Maybe there is a situation where scraping the service makes sense, but I can't see it.
Hi @Vinícius Sosnowski For point #1, could you share the
you're using in Prometheus? If using the
label (which by default sets to the Pod´s name) that would probably be inconsistent if some Pod gets redeployed, so in such case using relabeling+meta labels could help Then you could query for the metric's average of the three pods
Hi guys! @Geoff Salmon thanks for the explanation, I will be asking the people responsible for this to change it to only scrape from pods. @David Espejo (he/him) sorry I don't have access to the
at the moment but indeed we are using the
label defaulting to the Pod's name. I checked this article https://medium.com/kubehells/kubernetes-pod-monitors-re-labeling-managing-cardinality-6d38eea748d6 to see an example but I guess he only has one pod inside the node? Because I'm wondering what the result would be if 3 pods
label be renamed to a single thing. Will the metrics be aggregated?
@Vinícius Sosnowski for the example you sent, the selector will match any number of pods that match the label and, in that case, will proceed with relabelling. Inferring from the Prometheus docs, every relabelled Pod would be an instantiation of the same metric and every instance (or time series) could be aggregated using the supported PromQL operators
I think using relabelling to try to combine the metrics is the wrong approach. Prometheus won't automatically aggregate them in a useful way if the same time series (same metric name and label set) is scraped from multiple end. The pods can have different lifetimes, and counter metrics start at zero when a pod starts running, so simply summing counters of all active pods will produce a counter that is no longer non-decreasing. Prometheus' ways of aggregating metrics, computing rates and averages, assume non-decreasing counters, so you're in for an uphill slog trying to do this. If you haven't seen them, check out the rate and increase functions. Generally you'll want to use
on the individual counter metrics produced by each pod and then
over all the pods. https://prometheus.io/docs/practices/histograms/#count-and-sum-of-observations shows computing average rates from the sum or bucket time series of a histogram and the count timeseries.
Something like this would give you a time series containing average workflow durations over the past hour
Copy code
(The metric names are invented in this example)
Hi guys! Yep, it seems indeed that using the proper functions and then summing over all the pods is an easy and straightforward way to obtain the results I want. Thank you for your time helping me through that =)