balanced_accuracy / README.md
antonioalegria's picture
Added the Balanced Accuracy logic.
4912e21

A newer version of the Gradio SDK is available: 5.23.0

Upgrade
metadata
title: Accuracy
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: >-
  Balanced Accuracy is the average of recall obtained on each class. It can be
  computed with: Balanced Accuracy = (TPR + TNR) / N Where: TPR: True positive
  rate TNR: True negative rate N: Number of classes

Metric Card for Balanced Accuracy

Metric Description

Balanced Accuracy is the average of recall obtained on each class. It can be computed with: Balanced Accuracy = (TPR + TNR) / N Where: TPR: True positive rate TNR: True negative rate N: Number of classes

How to Use

At minimum, this metric requires predictions and references as inputs.

>>> accuracy_metric = evaluate.load("hyperml/balanced_accuracy")
>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
>>> print(results)
{'balanced_accuracy': 1.0}

Inputs

predictions (list of int): Predicted labels. references (list of int): Ground truth labels. sample_weight (list of float): Sample weights Defaults to None. adjusted (boolean): If set to True, adjusts the score by accounting for chance. Useful in handling imbalanced datasets. Defaults to False.

Output Values

  • balanced_accuracy (float): Balanced Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0. A higher score means higher balanced accuracy.

Output Example(s):

{'balanced_accuracy': 1.0}

This metric outputs a dictionary, containing the balanced accuracy score.

Values from Popular Papers

Balanced accuracy is often used to report performance on supervised classification tasks such as sentiment analysis or fraud detection, where there is a severe imbalance in the classes.

Examples

Example 1-A simple example

>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
>>> print(results)
{'balanced_accuracy': 0.5}

Example 2-The same as Example 1, except with sample_weight set.

>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
>>> print(results)
{'balanced_accuracy': 0.8778625954198473} # TODO: check if this is correct

Example 3-The same as Example 1, except with adjusted set to True.

>>> balanced_accuracy_metric = evaluate.load("balanced_accuracy")
>>> results = balanced_accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], adjusted=True)
>>> print(results)
{'balanced_accuracy': 0.8} # TODO: check if this is correct

Limitations and Bias

The balanced accuracy metric has limitations when it comes to extreme cases such as perfectly balanced or highly imbalanced datasets. For example, in perfectly balanced datasets, it behaves the same as standard accuracy. However, in highly imbalanced datasets where a class has very few samples, a small change in the prediction for that class can cause a large change in the balanced accuracy score.

Citation(s)

@article{scikit-learn,
  title={Scikit-learn: Machine Learning in {P}ython},
  author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
  journal={Journal of Machine Learning Research},
  volume={12},
  pages={2825--2830},
  year={2011}
}

Further References