Implement the Brier score and it's decomposition into resolution, reliability and uncertainty.

## 🚀 Feature

I would like to contribute to torchmetrics, by implementing the Brier score and its associated decomposition. 

### Motivation

The Brier score is widely used when measuring the calibration of machine learning methods see:

https://arxiv.org/abs/2302.04019  
https://arxiv.org/abs/2002.06470

It is also a proper scoring rule as opposed to the Expected Calibration Error (ECE) and Thresholded Calibration Error (TACE). This means that the ECE and the TACE have trivial minima where the classifier has **zero** test accuracy **while being perfectly calibrated** (https://arxiv.org/abs/1906.02530). The Brier score being a proper scoring rule doesn't have this pathological behaviour.

The Brier score coincides with the mean squared error for common use cases. However, its decomposition into resolution, reliability and uncertainty see https://en.wikipedia.org/wiki/Brier_score is a unique and useful feature. Roughly speaking `resolution` captures a notion of accuracy and `reliability` a notion of calibration. Thus both have to be optimized for the Brier score to be low. 

Finally, no standard implementation in common packages exists to the best of my knowledge. 

### Pitch

I plan to follow the original paper describing the decomposition of the Brier score into resolution, reliability and uncertainty

https://journals.ametsoc.org/view/journals/apme/12/4/1520-0450_1973_012_0595_anvpot_2_0_co_2.xml

and specifically the implementation found in 

https://github.com/google-research/google-research/blob/master/uq_benchmark_2019/metrics_lib.py
and the paper
https://arxiv.org/abs/1906.02530

The decomposition into uncertainty, resolution and reliability was originally formulated for predictions which take a finite set of values. This is in contrast with the output vectors of most deep neural network classifiers which output a vector of probabilities per class, which take continuous values. Thus we need to create bins for our output vectors. The specific bins in this implementation are with respect to the top most probable class for each input signal. Thus we create C bins where C is the number of classes. Then two prediction vectors [0 , 0.9, 0.1] and [0.2, 0.6, 0.2] fall in the same bin, the bin of class 2. The derivation of resolution, reliability and uncertainty is then relatively straighforward. 







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement the Brier score and it's decomposition into resolution, reliability and uncertainty. #2196

🚀 Feature

Motivation

Pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement the Brier score and it's decomposition into resolution, reliability and uncertainty. #2196

Description

🚀 Feature

Motivation

Pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions