Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

hello! Is there a mean to add custom tags on the Flyte provided metrics mentioned <https://docs.flyte.org/en/latest/deployment/cluster_config/monitoring.html#flyte-provided-metrics|there>? The only available tag is currently the workflow name (`wf`), but we would like to add dimensions to the relevant metrics depending of the workflow inputs or the custom labels provided while submitting the workflow. Those labels are only added on the relevant pods as far as I saw but that would be great if those labels could be somehow added to the metrics as well

Hi <@U03J0QPPAKU>! Currently, the prometheus metric labels are configurable in FlytePropeller with <https://github.com/flyteorg/flytepropeller/blob/43101764c9ba23f5eb08bb7cf95aa61ecd22c16e/pkg/controller/config/config.go#L134|this configuration option>. However, the only supported options are <https://github.com/flyteorg/flytestdlib/blob/c9998d369fa9bec310e3db6711a401ee410d6f57/contextutils/context.go#L13-L26|here>. We have these restrictions because prometheus best practices are to keep metric labels to very low cardinality. Because of the way things are internally tracked memory utilization can <https://blackpine.io/posts/2022.07.07-on-memory-utilization-of-prometheus-exporters/|grow very quickly> if we added metric labels for things like workflow inputs, etc. What kind of custom labels are you looking for?

hi, thanks for your answer!
The Prometheus best practices regarding the labels make sense and in our case we would not end with an unreasonable number of labels.

A basic use-case example would be executing a given workflow several times on different tenants. The inputs would change ofc, however there is no need to add all inputs as labels. We would have been very fine with simply adding a `tenant_id` label to the metrics, that we could provide at the same time as the workflow inputs while submitting the execution

i.e something that behaves more or less the same as the Kubernetes pods labels/annotations we could add through <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.remote.remote.Options.html#flytekit.remote.remote.Options>

<@U03J0QPPAKU> you can add labels to an execution and those will get added to the pods

That should then be available to the pods

hi <@UNZB4NW3S>, I know. We would actually need the same for the metrics

Custom metrics on the workflow sounds like a dangerous recipe- we recently found some users had bloated cardinality - which becomes expensive and needs more memory 

Can you not use project as the tenant I'd?

no, we can expect to have dozens of tenants. We would like to stick with one project (and its related domains) and publish/run workflows from there.
Only the inputs (and labels) would change from a tenant to another.
Additionally, the tenant id was a simple example but some other dimensions are needed as well.

The concern you both mentioned is understandable, did you consider limiting the number of metrics custom labels at runtime so an execution could fail fast if it exceeds that limit?

<@U03J0QPPAKU> you can use one project to publish but execute in a separate project 

I guess those projects would need to be created beforehand, is that right? I don't think we would like to add that additional maintenance on our side each time a tenant needs to be added.

Regardless, as I mentioned, that would not cover the actual use-case since the tenant id is not the only dimension we need to add. In our precise case, we got 3 business dimensions we need to track

Why don't you propose a change, all the code is in flytestdlib