Metric "elasticsearch_cluster_health_timed_out" should be 1 and not absent when querying cluster API fails.

elasticsearch_cluster_health_timed_out is a gauge metric. 
I was under the impression that its value would oscillate between 0 and 1 depending on whether it can query cluster health API or not.
So in a situation where elasticsearch service has gone down, I was expecting the metric to turn to 1, instead it just goes away. 

I could configure alert rules using something like absent(elasticsearch_cluster_health_timed_out) but I think that isn't the right way to do this.

Even the official prometheus' docs recommend avoiding missing metrics.
Here https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metric "elasticsearch_cluster_health_timed_out" should be 1 and not absent when querying cluster API fails. #212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric "elasticsearch_cluster_health_timed_out" should be 1 and not absent when querying cluster API fails. #212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions