hi folks, i'm looking at available Flyte metrics to help us monitor the platform as we run large workloads (and stress test the cluster). Are there good docs/samples available? I found
this but it's very high level.
One specific question: Is this metric cumulative -
flyte-propeller-all-workflow-acceptance.latency.ms.sum
? Sample chat below. Or perhaps I should be looking at percentiles only (.ms.0.9 and .ms.0.99)?