I see 3 types of prometheus metrics, admin, propel...
# flyte-deployment
f
I see 3 types of prometheus metrics, admin, propeller and console. I then see that one is meant to rely on kube_state_metrics metrics of pod to get other information from the exactual execution (like cpu usage etc) Are there some metrics exposed directly by the running pods that we should also collect? or which other ways are there to collect metrics on running executions?
like for instance, in this documentation you talk about the stats handler https://www.union.ai/docs/flyte/deployment/flyte-configuration/monitoring/
User Stats With Flyte
The workflow parameters object that the SDK injects into various tasks has a
statsd
handle that users should call to emit stats of their workflows not captured by the default metrics. The usual caveats around cardinality apply, of course.
when using that handler, where would the metrics be found? collected by propeller? or need to be scraped from the nodes?
a
hey @fierce-farmer-40956 any specific metric you're looking for?
f
I have 2 cases 1. Create a dashboard that for a given execution gives detailed information about the resources used by each task/workflow. 2. Allow users to create their own metrics to see the progress of their experiments
w
We used GKE managed prometheus + Grafana Here are some queries we used to get CPU and memory usage for tasks in our Grafana dash.
Copy code
kubernetes_io:container_cpu_request_cores{project_id="$project_id", pod_name="$pod_name"}

kubernetes_io:container_cpu_request_utilization{project_id="$project_id", pod_name="$pod_name"}

kubernetes_io:container_memory_request_bytes{project_id="$project_id", pod_name="$pod_name"}

sum(kubernetes_io:container_memory_used_bytes{project_id="$project_id", pod_name="$pod_name"})
https://www.union.ai/docs/flyte/deployment/flyte-configuration/configuring-logging-links-in-the-ui/ • you can then use this to create a link to the Grafana dash in the FlyteConsole
a
nice, thanks for sharing @fierce-farmer-40956
f
Thank you, yes I have that as well. What about exposing metrics from withing the task itself? what do people use? is there an example of use of this statd handler? or shall I just move to implement my own using standart prometheus stack?
specifically asking about this
User Stats With Flyte
The workflow parameters object that the SDK injects into various tasks has a
statsd
handle that users should call to emit stats of their workflows not captured by the default metrics. The usual caveats around cardinality apply, of course.
Users are encouraged to avoid creating their own stats handlers. If not done correctly, these can pollute the general namespace and accidentally interfere with the production stats of live services, causing pages and wreaking havoc. If you’re using any libraries that emit stats, it’s best to turn them off if possible.
w
Our users didn't find exposing metrics within the task that useful so we haven't put the engineering effort into it.
f
Ok, so I'll look into implementing my solution then. Thank you