Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

hey! is there a metric in propeller to measure failures from plugins? We use these recording rules currently, but each time we add a new plugin we need to go and add it to the rules :thread:

```- record: computed:propeller:all:node:plugin:failure_unlabeled
  expr: |    label_replace(flyte:propeller:all:node:plugin:bigquery_ddl_create_dataset_task_bigquery_api_failure_unlabeled, "plugin", "bigquery_ddl_create_dataset_task_bigquery_api", "", "")

- record: computed:propeller:all:node:plugin:failure_unlabeled
  expr: |
    label_replace(flyte:propeller:all:node:plugin:flink_flink_failure_unlabeled, "plugin", "flink_flink", "", "")```

hey <@U01KKNMC47J> what do you mean by failures from plugins?

I guess if a reconcile failure is due to any specific plugin having problems 

<https://github.com/flyteorg/flytepropeller/blob/b5b5e0944272a70894aceac35a6fc779d1e1efbc/pkg/controller/nodes/task/handler.go#L209,L335|this> is where we do the plugin to task type resolution but it looks like we only log if we encounter issues. are you mostly trying to make sure if you configure a new plugin it will actually be used for a specific task type?

We would like to understand if problems in propeller or if a set of workflows are not progressing is due to a specific plugin. For example when handle is called on a specific plugin, if some exception comes from the part of the code of the plugin.

Are the existing error logs sufficient in this case?

yes it works! it would be awesome though if it was possible. we had some issues recently where all our metrics looked fine, but one plugin was misbehaving due to some change in the general propeller config, so it would have been nice to be able to alert on it