We are deploying flyte using the flyte-binary helm...
# flyte-deployment
n
We are deploying flyte using the flyte-binary helm chart. This document says that "Flyte Backend is written in Golang and exposes stats using Prometheus." We don't see prometheus annotations -
<http://prometheus.io/scrape|prometheus.io/scrape>
,
<http://prometheus.io/port|prometheus.io/port>
,
<http://prometheus.io/path|prometheus.io/path>
being applied automatically. Also, we can find any config setting to enable this. Should this be happening automatically on deploy or we need to change some config setting. Also, found this where port number is mentioned, but the flyte-binary pod only exposes 8088, 8089 and 9443, not 10254, should that be exposed somehow as well?
j
yes 10254 looks like it should be exposed @Nandakumar Raghu
as for the annotations, that's a bit of an implementation detail based on how your prometheus scrape config is set up. you can add it if your prometheus is configured to discover workloads based on those annotations. another way to do it if you are using the prometheus-operator is with pod/service monitors: https://github.com/prometheus-operator/prometheus-operator/blob/main/example/user-guides/getting-started/example-app-pod-monitor.yaml
n
Any idea why port 10254 is not exposed from the pod? Is there a config or setting somewhere we can check?
j
you mean as a
containerPort
?
n
yes
If a port is exposed, I should be able to see it when I describe the pod correct?
j
it needs to be explicitly added here
n
Ok, thanks, I'll try that out.
Hi @jeev, I was able to expose the metrics port and see metrics in datadog (the datadog agent is set to scrape metrics from pods that are annotated with prometheus annotations). I am trying to set up the core metrics dashboard shown here. However, I am not able to find the metrics referenced in the grafana json on datadog. The metrics referenced in the json are of the format -
Copy code
flyte:propeller:all:free_workers_count
flyte:propeller:all:round:abort_error[5m]
flyte:propeller:all:round:system_error_unlabeled[5m]
flyte:propeller:all:node:plugin:.*_failure_unlabeled
flyte:propeller:all:node:plugin:.*_success_unlabeled
flyte:propeller:all:round:raw_unlabeled_ms[5m]
flyte:propeller:all:round:raw_ms[5m]
flyte:propeller:all:round:panic_unlabeled[5m]
flyte:propeller:all:collector:flyteworkflow
flyte:propeller:all:metastore:cache_hit
flyte:propeller:all:metastore:cache_miss
flyte:propeller:all:metastore:head_failure_unlabeled
But on datadog, when I search metrics for "Propeller", I can only see these -
Copy code
flyte_admin_admin_builder_flytepropeller_build_failures.count
flyte_admin_admin_builder_flytepropeller_build_successes.count
flyte_admin_admin_execution_manager_propeller_failures.count
1. Why aren't we seeing all the metrics for propeller? (for metastore, plugin etc) and 2. Why are the metrics names different, is there some aggregation happening somewhere?
j
hmm it’s possible that there is a bug in flyte-binary that isn’t properly exposing all the metrics. this isn’t an issue in flyte-core since the services are separate.
can you open an issue @Nandakumar Raghu
k
@jeev I think you are right, maybe, we have to register multiple handles
j
or merge?
n
b
@Nandakumar Raghu did you add it under deployment.annotations?
it needs to be explicitly added here
@jeev - we installed flyte-binary using helm in eks and utilising flyreorg’s repo. if this needs to be explicitly added, does that mean we’ll need to manage the chart on our own, since
ports
isn’t exposed via
values.yaml
?
j
@Brian Tang: ideally we can just fix the chart, but we haven't gotten to this unfortunately.
b
@jeev - i can create a PR if that’s something that would be desired? something like this:
Copy code
# deployment.yaml
ports:
  - name: http
    containerPort: 8088
  - name: grpc
    containerPort: 8089
  - name: webhook
    containerPort: 9443
  {{- if .Values.deployment.profilerPort }}
  - name: profiler
    containerPort: {{ .Values.deployment.profilerPort }}
  {{- end }}

# values.yaml
  deployment:
    ...
    profilerPort: 10254
or just remove the
if
and hardcode it like the other ports in
deployment.yaml
j
hardcoding is probably fine. the issue though is that not all the metrics are currently available through a single port.
108 Views