File size: 3,798 Bytes
914c3d9
4912e21
 
 
 
914c3d9
abb1d54
914c3d9
 
4912e21
 
 
 
 
 
 
 
 
 
914c3d9
 
abb1d54
 
 
4912e21
 
 
 
 
 
 
abb1d54
 
 
4912e21
 
 
 
 
 
 
 
abb1d54
 
4912e21
 
 
 
 
abb1d54
 
 
4912e21
abb1d54
4912e21
 
 
 
 
 
abb1d54
 
4912e21
 
abb1d54
 
4912e21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abb1d54
 
 
4912e21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abb1d54
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: Accuracy
emoji: 🤗 
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
  Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
  Balanced Accuracy = (TPR + TNR) / N
  Where:
  TPR: True positive rate
  TNR: True negative rate
  N: Number of classes
---

# Metric Card for Balanced Accuracy

## Metric Description

Balanced Accuracy is the average of recall obtained on each class. It can be computed with:
Balanced Accuracy = (TPR + TNR) / N
 Where:
TPR: True positive rate
TNR: True negative rate
N: Number of classes

## How to Use

At minimum, this metric requires predictions and references as inputs.

```python
>>> accuracy_metric = evaluate.load("hyperml/balanced_accuracy")
>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
>>> print(results)
{'balanced_accuracy': 1.0}
```

### Inputs

**predictions** (list of int): Predicted labels.
**references** (list of int): Ground truth labels.
**sample_weight** (list of float): Sample weights Defaults to None.
**adjusted** (boolean): If set to True, adjusts the score by accounting for chance. Useful in handling imbalanced datasets. Defaults to False.

### Output Values

- **balanced_accuracy** (float): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.

Output Example(s):
```python
{'balanced_accuracy': 1.0}
```

This metric outputs a dictionary, containing the balanced accuracy score.

#### Values from Popular Papers

Balanced accuracy is often used to report performance on supervised classification tasks such as sentiment analysis or fraud detection, where there is a severe imbalance in the classes.

### Examples

Example 1-A simple example
```python
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
>>> print(results)
{'balanced_accuracy': 0.5}
```

Example 2-The same as Example 1, except with `sample_weight` set.
```python
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
>>> print(results)
{'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct
```

Example 3-The same as Example 1, except with `adjusted` set to `True`.
```python
>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
>>> print(results)
{'balanced_accuracy': 0.8} # TODO: check if this is correct
```

## Limitations and Bias

The balanced accuracy metric has limitations when it comes to extreme cases such as perfectly balanced or highly imbalanced datasets. For example, in perfectly balanced datasets, it behaves the same as standard accuracy. However, in highly imbalanced datasets where a class has very few samples, a small change in the prediction for that class can cause a large change in the balanced accuracy score.

## Citation(s)
```bibtex
@article{scikit-learn,
  title={Scikit-learn: Machine Learning in {P}ython},
  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  journal={Journal of Machine Learning Research},
  volume={12},
  pages={2825--2830},
  year={2011}
}
```

## Further References