Spaces:
Runtime error
Runtime error
title: Average Precision | |
tags: | |
- evaluate | |
- metric | |
description: "Average precision score." | |
sdk: gradio | |
sdk_version: 3.19.1 | |
app_file: app.py | |
pinned: false | |
# Metric Card for Average Precision | |
## How to Use | |
```python | |
import evaluate | |
metric = evaluate.load("chanelcolgate/average_precision") | |
results = metric.compute(references=references, prediction_scores=prediction_scores) | |
``` | |
### Inputs | |
- **y_true** (`ndarray` of shape (n_samples,) or (n_samples, n_classes)): True binary labels or binary label indicators. | |
- **y_score** (`ndarray` of shape (n_samples,) or (n_samples, n_classes)): | |
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by :term:`decision_function` on some classifiers). | |
- **average**: {'micro', 'samples', 'weighted', 'macro'} or None, default='macro` | |
If ``None``, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data: | |
``'micro'``: | |
Calculate metrics globally by considering each element of the label indicator matrix as a label. | |
``'macro'``: | |
Calculate metrics for each label, and find their unweighted mean This does not take label imbalance into account. | |
``'weighted'``: | |
Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). | |
``'samples'``: | |
Calculate metrics for each label, and find their average | |
Will be ignored when ``y_true`` is binary. | |
- **pos_label** (`int` or `str`, default=1): The label of the positive class. Only applied to binary ``y_true``. For multilabel-indicator ``y_true``, ``pos_label`` is fixed to 1. | |
- **sample_weight** (`array-like` of shape (n_samples,), default=None): Sample weights. | |
### Output Values | |
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}* | |
*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."* | |
#### Values from Popular Papers | |
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* | |
### Examples | |
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* | |
## Limitations and Bias | |
*Note any known limitations or biases that the metric has, with links and references if possible.* | |
## Citation | |
*Cite the source where this metric was introduced.* | |
## Further References | |
*Add any useful further references.* | |