jplu commited on
Commit
5fba0d7
1 Parent(s): 05e76bd

Update README

Browse files
Files changed (1) hide show
  1. README.md +58 -19
README.md CHANGED
@@ -4,8 +4,7 @@ tags:
4
  - evaluate
5
  - metric
6
  description: >-
7
- a classification report is a simple tool to compute multiple metrics such as:
8
- accuracy precision/recall/f1-score by class. mean/weighted average.
9
  sdk: gradio
10
  sdk_version: 3.0.2
11
  app_file: app.py
@@ -15,37 +14,77 @@ license: apache-2.0
15
 
16
  # Metric Card for classification_report
17
 
18
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
19
 
20
  ## Metric Description
21
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
 
22
 
23
  ## How to Use
24
- *Give general statement of how to use the metric*
25
 
26
- *Provide simplest possible example for using the metric*
27
 
28
- ### Inputs
29
- *List all input arguments in the format below*
30
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 
 
 
31
 
32
- ### Output Values
33
 
34
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
 
 
 
 
 
 
 
 
35
 
36
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- #### Values from Popular Papers
39
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 
 
40
 
41
  ### Examples
42
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
43
 
44
- ## Limitations and Bias
45
- *Note any known limitations or biases that the metric has, with links and references if possible.*
 
 
 
 
 
46
 
47
  ## Citation
48
- *Cite the source where this metric was introduced.*
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ## Further References
51
- *Add any useful further references.*
 
4
  - evaluate
5
  - metric
6
  description: >-
7
+ Build a text report showing the main classification metrics that are accuracy, precision, recall and F1.
 
8
  sdk: gradio
9
  sdk_version: 3.0.2
10
  app_file: app.py
 
14
 
15
  # Metric Card for classification_report
16
 
 
17
 
18
  ## Metric Description
19
+
20
+ Build a text report showing the main classification metrics that are accuracy, precision, recall and F1.
21
+
22
 
23
  ## How to Use
 
24
 
25
+ At minimum, this metric requires predictions and references as inputs.
26
 
27
+ ```python
28
+ >>> accuracy_metric = evaluate.load("accuracy")
29
+ >>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
30
+ >>> print(results)
31
+ {'0': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 1}, '1': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 1}, 'accuracy': 1.0, 'macro avg': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2}, 'weighted avg': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2}}
32
+ ```
33
 
 
34
 
35
+ ### Inputs
36
+ - **predictions** (`list` of `int`): Predicted labels.
37
+ - **references** (`list` of `int`): Ground truth labels.
38
+ - **labels** (`list` of `int`): Optional list of label indices to include in the report. Defaults to None.
39
+ - **target_names** (`list` of `str`): Optional display names matching the labels (same order). Defaults to None.
40
+ - **sample_weight** (`list` of `float`): Sample weights. Defaults to None.
41
+ - **digits** (`int`): Number of digits for formatting output floating point values. When output_dict is True, this will be ignored and the returned values will not be rounded. Defaults to 2.
42
+ - **zero_division** (`warn`, `0` or `1`): Sets the value to return when there is a zero division. If set to “warn”, this acts as 0, but warnings are also raised. Defaults to `warn`.
43
+
44
 
45
+ ### Output Values
46
+ - report (`str` or `dict`): Text summary of the precision, recall, F1 score for each class. Dictionary returned if output_dict is True. Dictionary has the following structure:
47
+ ```
48
+ {'label 1': {'precision':0.5,
49
+ 'recall':1.0,
50
+ 'f1-score':0.67,
51
+ 'support':1},
52
+ 'label 2': { ... },
53
+ ...
54
+ }
55
+ ```
56
+ The reported averages include macro average (averaging the unweighted mean per label), weighted average (averaging the support-weighted mean per label), and sample average (only for multilabel classification). Micro average (averaging the total true positives, false negatives and false positives) is only shown for multi-label or multi-class with a subset of classes, because it corresponds to accuracy otherwise and would be the same for all metrics. See also precision_recall_fscore_support for more details on averages.
57
+ Note that in binary classification, recall of the positive class is also known as “sensitivity”; recall of the negative class is “specificity”.
58
 
59
+ Output Example(s):
60
+ ```python
61
+ {'0': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 1}, '1': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 1}, 'accuracy': 1.0, 'macro avg': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2}, 'weighted avg': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2}}
62
+ ```
63
 
64
  ### Examples
 
65
 
66
+ Simple Example:
67
+ ```python
68
+ >>> accuracy_metric = evaluate.load("bstrai/classification_report")
69
+ >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
70
+ >>> print(results)
71
+ {'0': {'precision': 0.5, 'recall': 0.5, 'f1-score': 0.5, 'support': 2}, '1': {'precision': 0.6666666666666666, 'recall': 1.0, 'f1-score': 0.8, 'support': 2}, '2': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 2}, 'accuracy': 0.5, 'macro avg': {'precision': 0.38888888888888884, 'recall': 0.5, 'f1-score': 0.43333333333333335, 'support': 6}, 'weighted avg': {'precision': 0.38888888888888884, 'recall': 0.5, 'f1-score': 0.43333333333333335, 'support': 6}}
72
+ ```
73
 
74
  ## Citation
75
+ ```bibtex
76
+ @article{scikit-learn,
77
+ title={Scikit-learn: Machine Learning in {P}ython},
78
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
79
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
80
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
81
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
82
+ journal={Journal of Machine Learning Research},
83
+ volume={12},
84
+ pages={2825--2830},
85
+ year={2011}
86
+ }
87
+ ```
88
+
89
 
90
  ## Further References