lvwerra HF staff commited on
Commit
cc1c035
1 Parent(s): 74b04a7

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +119 -5
  2. app.py +6 -0
  3. mae.py +113 -0
  4. requirements.txt +4 -0
README.md CHANGED
@@ -1,12 +1,126 @@
1
  ---
2
- title: Mae
3
- emoji: 📈
4
- colorFrom: gray
5
- colorTo: pink
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MAE
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for MAE
16
+
17
+
18
+ ## Metric Description
19
+
20
+ Mean Absolute Error (MAE) is the mean of the magnitude of difference between the predicted and actual numeric values:
21
+ ![image](https://user-images.githubusercontent.com/14205986/165824243-e1078dfd-489d-456c-a0da-cbaa28726220.png)
22
+
23
+
24
+ ## How to Use
25
+
26
+ At minimum, this metric requires predictions and references as inputs.
27
+
28
+ ```python
29
+ >>> mae_metric = evaluate.load("mae")
30
+ >>> predictions = [2.5, 0.0, 2, 8]
31
+ >>> references = [3, -0.5, 2, 7]
32
+ >>> results = mae_metric.compute(predictions=predictions, references=references)
33
+ ```
34
+
35
+ ### Inputs
36
+
37
+ Mandatory inputs:
38
+ - `predictions`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the estimated target values.
39
+ - `references`: numeric array-like of shape (`n_samples,`) or (`n_samples`, `n_outputs`), representing the ground truth (correct) target values.
40
+
41
+ Optional arguments:
42
+ - `sample_weight`: numeric array-like of shape (`n_samples,`) representing sample weights. The default is `None`.
43
+ - `multioutput`: `raw_values`, `uniform_average` or numeric array-like of shape (`n_outputs,`), which defines the aggregation of multiple output values. The default value is `uniform_average`.
44
+ - `raw_values` returns a full set of errors in case of multioutput input.
45
+ - `uniform_average` means that the errors of all outputs are averaged with uniform weight.
46
+ - the array-like value defines weights used to average errors.
47
+
48
+ ### Output Values
49
+ This metric outputs a dictionary, containing the mean absolute error score, which is of type:
50
+ - `float`: if multioutput is `uniform_average` or an ndarray of weights, then the weighted average of all output errors is returned.
51
+ - numeric array-like of shape (`n_outputs,`): if multioutput is `raw_values`, then the score is returned for each output separately.
52
+
53
+ Each MAE `float` value ranges from `0.0` to `1.0`, with the best value being 0.0.
54
+
55
+ Output Example(s):
56
+ ```python
57
+ {'mae': 0.5}
58
+ ```
59
+
60
+ If `multioutput="raw_values"`:
61
+ ```python
62
+ {'mae': array([0.5, 1. ])}
63
+ ```
64
+
65
+ #### Values from Popular Papers
66
+
67
+
68
+ ### Examples
69
+
70
+ Example with the `uniform_average` config:
71
+ ```python
72
+ >>> mae_metric = evaluate.load("mae")
73
+ >>> predictions = [2.5, 0.0, 2, 8]
74
+ >>> references = [3, -0.5, 2, 7]
75
+ >>> results = mae_metric.compute(predictions=predictions, references=references)
76
+ >>> print(results)
77
+ {'mae': 0.5}
78
+ ```
79
+
80
+ Example with multi-dimensional lists, and the `raw_values` config:
81
+ ```python
82
+ >>> mae_metric = evaluate.load("mae", "multilist")
83
+ >>> predictions = [[0.5, 1], [-1, 1], [7, -6]]
84
+ >>> references = [[0, 2], [-1, 2], [8, -5]]
85
+ >>> results = mae_metric.compute(predictions=predictions, references=references)
86
+ >>> print(results)
87
+ {'mae': 0.75}
88
+ >>> results = mae_metric.compute(predictions=predictions, references=references, multioutput='raw_values')
89
+ >>> print(results)
90
+ {'mae': array([0.5, 1. ])}
91
+ ```
92
+
93
+ ## Limitations and Bias
94
+ One limitation of MAE is that the relative size of the error is not always obvious, meaning that it can be difficult to tell a big error from a smaller one -- metrics such as Mean Absolute Percentage Error (MAPE) have been proposed to calculate MAE in percentage terms.
95
+
96
+ Also, since it calculates the mean, MAE may underestimate the impact of big, but infrequent, errors -- metrics such as the Root Mean Square Error (RMSE) compensate for this by taking the root of error values.
97
+
98
+ ## Citation(s)
99
+ ```bibtex
100
+ @article{scikit-learn,
101
+ title={Scikit-learn: Machine Learning in {P}ython},
102
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
103
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
104
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
105
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
106
+ journal={Journal of Machine Learning Research},
107
+ volume={12},
108
+ pages={2825--2830},
109
+ year={2011}
110
+ }
111
+ ```
112
+
113
+ ```bibtex
114
+ @article{willmott2005advantages,
115
+ title={Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance},
116
+ author={Willmott, Cort J and Matsuura, Kenji},
117
+ journal={Climate research},
118
+ volume={30},
119
+ number={1},
120
+ pages={79--82},
121
+ year={2005}
122
+ }
123
+ ```
124
+
125
+ ## Further References
126
+ - [Mean Absolute Error - Wikipedia](https://en.wikipedia.org/wiki/Mean_absolute_error)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("mae")
6
+ launch_gradio_widget(module)
mae.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """MAE - Mean Absolute Error Metric"""
15
+
16
+ import datasets
17
+ from sklearn.metrics import mean_absolute_error
18
+
19
+ import evaluate
20
+
21
+
22
+ _CITATION = """\
23
+ @article{scikit-learn,
24
+ title={Scikit-learn: Machine Learning in {P}ython},
25
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
26
+ and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
27
+ and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
28
+ Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
29
+ journal={Journal of Machine Learning Research},
30
+ volume={12},
31
+ pages={2825--2830},
32
+ year={2011}
33
+ }
34
+ """
35
+
36
+ _DESCRIPTION = """\
37
+ Mean Absolute Error (MAE) is the mean of the magnitude of difference between the predicted and actual
38
+ values.
39
+ """
40
+
41
+
42
+ _KWARGS_DESCRIPTION = """
43
+ Args:
44
+ predictions: array-like of shape (n_samples,) or (n_samples, n_outputs)
45
+ Estimated target values.
46
+ references: array-like of shape (n_samples,) or (n_samples, n_outputs)
47
+ Ground truth (correct) target values.
48
+ sample_weight: array-like of shape (n_samples,), default=None
49
+ Sample weights.
50
+ multioutput: {"raw_values", "uniform_average"} or array-like of shape (n_outputs,), default="uniform_average"
51
+ Defines aggregating of multiple output values. Array-like value defines weights used to average errors.
52
+
53
+ "raw_values" : Returns a full set of errors in case of multioutput input.
54
+
55
+ "uniform_average" : Errors of all outputs are averaged with uniform weight.
56
+
57
+ Returns:
58
+ mae : mean absolute error.
59
+ If multioutput is "raw_values", then mean absolute error is returned for each output separately. If multioutput is "uniform_average" or an ndarray of weights, then the weighted average of all output errors is returned.
60
+ MAE output is non-negative floating point. The best value is 0.0.
61
+ Examples:
62
+
63
+ >>> mae_metric = evaluate.load("mae")
64
+ >>> predictions = [2.5, 0.0, 2, 8]
65
+ >>> references = [3, -0.5, 2, 7]
66
+ >>> results = mae_metric.compute(predictions=predictions, references=references)
67
+ >>> print(results)
68
+ {'mae': 0.5}
69
+
70
+ If you're using multi-dimensional lists, then set the config as follows :
71
+
72
+ >>> mae_metric = evaluate.load("mae", "multilist")
73
+ >>> predictions = [[0.5, 1], [-1, 1], [7, -6]]
74
+ >>> references = [[0, 2], [-1, 2], [8, -5]]
75
+ >>> results = mae_metric.compute(predictions=predictions, references=references)
76
+ >>> print(results)
77
+ {'mae': 0.75}
78
+ >>> results = mae_metric.compute(predictions=predictions, references=references, multioutput='raw_values')
79
+ >>> print(results)
80
+ {'mae': array([0.5, 1. ])}
81
+ """
82
+
83
+
84
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
85
+ class Mae(evaluate.EvaluationModule):
86
+ def _info(self):
87
+ return evaluate.EvaluationModuleInfo(
88
+ description=_DESCRIPTION,
89
+ citation=_CITATION,
90
+ inputs_description=_KWARGS_DESCRIPTION,
91
+ features=datasets.Features(self._get_feature_types()),
92
+ reference_urls=[
93
+ "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html"
94
+ ],
95
+ )
96
+
97
+ def _get_feature_types(self):
98
+ if self.config_name == "multilist":
99
+ return {
100
+ "predictions": datasets.Sequence(datasets.Value("float")),
101
+ "references": datasets.Sequence(datasets.Value("float")),
102
+ }
103
+ else:
104
+ return {
105
+ "predictions": datasets.Value("float"),
106
+ "references": datasets.Value("float"),
107
+ }
108
+
109
+ def _compute(self, predictions, references, sample_weight=None, multioutput="uniform_average"):
110
+
111
+ mae_score = mean_absolute_error(references, predictions, sample_weight=sample_weight, multioutput=multioutput)
112
+
113
+ return {"mae": mae_score}
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ datasets~=2.0
4
+ sklearn