lvwerra HF staff commited on
Commit
440466a
·
1 Parent(s): e21940e

Update Space (evaluate main: 828c6327)

Browse files
Files changed (4) hide show
  1. README.md +77 -5
  2. app.py +6 -0
  3. mahalanobis.py +100 -0
  4. requirements.txt +3 -0
README.md CHANGED
@@ -1,12 +1,84 @@
1
  ---
2
- title: Mahalanobis
3
- emoji: 📉
4
- colorFrom: purple
5
- colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Mahalanobis Distance
3
+ emoji: 🤗
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
  sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
  ---
14
 
15
+ # Metric Card for Mahalanobis Distance
16
+
17
+ ## Metric Description
18
+ Mahalonobis distance is the distance between a point and a distribution (as opposed to the distance between two points), making it the multivariate equivalent of the Euclidean distance.
19
+
20
+ It is often used in multivariate anomaly detection, classification on highly imbalanced datasets and one-class classification.
21
+
22
+ ## How to Use
23
+ At minimum, this metric requires two `list`s of datapoints:
24
+
25
+ ```python
26
+ >>> mahalanobis_metric = evaluate.load("mahalanobis")
27
+ >>> results = mahalanobis_metric.compute(reference_distribution=[[0, 1], [1, 0]], X=[[0, 1]])
28
+ ```
29
+
30
+ ### Inputs
31
+ - `X` (`list`): data points to be compared with the `reference_distribution`.
32
+ - `reference_distribution` (`list`): data points from the reference distribution that we want to compare to.
33
+
34
+ ### Output Values
35
+ `mahalanobis` (`array`): the Mahalonobis distance for each data point in `X`.
36
+
37
+ ```python
38
+ >>> print(results)
39
+ {'mahalanobis': array([0.5])}
40
+ ```
41
+
42
+ #### Values from Popular Papers
43
+ *N/A*
44
+
45
+ ### Example
46
+
47
+ ```python
48
+ >>> mahalanobis_metric = evaluate.load("mahalanobis")
49
+ >>> results = mahalanobis_metric.compute(reference_distribution=[[0, 1], [1, 0]], X=[[0, 1]])
50
+ >>> print(results)
51
+ {'mahalanobis': array([0.5])}
52
+ ```
53
+
54
+ ## Limitations and Bias
55
+
56
+ The Mahalanobis distance is only able to capture linear relationships between the variables, which means it cannot capture all types of outliers. Mahalanobis distance also fails to faithfully represent data that is highly skewed or multimodal.
57
+
58
+ ## Citation
59
+ ```bibtex
60
+ @inproceedings{mahalanobis1936generalized,
61
+ title={On the generalized distance in statistics},
62
+ author={Mahalanobis, Prasanta Chandra},
63
+ year={1936},
64
+ organization={National Institute of Science of India}
65
+ }
66
+ ```
67
+
68
+ ```bibtex
69
+ @article{de2000mahalanobis,
70
+ title={The Mahalanobis distance},
71
+ author={De Maesschalck, Roy and Jouan-Rimbaud, Delphine and Massart, D{\'e}sir{\'e} L},
72
+ journal={Chemometrics and intelligent laboratory systems},
73
+ volume={50},
74
+ number={1},
75
+ pages={1--18},
76
+ year={2000},
77
+ publisher={Elsevier}
78
+ }
79
+ ```
80
+
81
+ ## Further References
82
+ -[Wikipedia -- Mahalanobis Distance](https://en.wikipedia.org/wiki/Mahalanobis_distance)
83
+
84
+ -[Machine Learning Plus -- Mahalanobis Distance](https://www.machinelearningplus.com/statistics/mahalanobis-distance/)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("mahalanobis")
6
+ launch_gradio_widget(module)
mahalanobis.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2021 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """Mahalanobis metric."""
15
+
16
+ import datasets
17
+ import numpy as np
18
+
19
+ import evaluate
20
+
21
+
22
+ _DESCRIPTION = """
23
+ Compute the Mahalanobis Distance
24
+
25
+ Mahalonobis distance is the distance between a point and a distribution.
26
+ And not between two distinct points. It is effectively a multivariate equivalent of the Euclidean distance.
27
+ It was introduced by Prof. P. C. Mahalanobis in 1936
28
+ and has been used in various statistical applications ever since
29
+ [source: https://www.machinelearningplus.com/statistics/mahalanobis-distance/]
30
+ """
31
+
32
+ _CITATION = """\
33
+ @article{de2000mahalanobis,
34
+ title={The mahalanobis distance},
35
+ author={De Maesschalck, Roy and Jouan-Rimbaud, Delphine and Massart, D{\'e}sir{\'e} L},
36
+ journal={Chemometrics and intelligent laboratory systems},
37
+ volume={50},
38
+ number={1},
39
+ pages={1--18},
40
+ year={2000},
41
+ publisher={Elsevier}
42
+ }
43
+ """
44
+
45
+ _KWARGS_DESCRIPTION = """
46
+ Args:
47
+ X: List of datapoints to be compared with the `reference_distribution`.
48
+ reference_distribution: List of datapoints from the reference distribution we want to compare to.
49
+ Returns:
50
+ mahalanobis: The Mahalonobis distance for each datapoint in `X`.
51
+ Examples:
52
+
53
+ >>> mahalanobis_metric = evaluate.load("mahalanobis")
54
+ >>> results = mahalanobis_metric.compute(reference_distribution=[[0, 1], [1, 0]], X=[[0, 1]])
55
+ >>> print(results)
56
+ {'mahalanobis': array([0.5])}
57
+ """
58
+
59
+
60
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
61
+ class Mahalanobis(evaluate.EvaluationModule):
62
+ def _info(self):
63
+ return evaluate.EvaluationModuleInfo(
64
+ description=_DESCRIPTION,
65
+ citation=_CITATION,
66
+ inputs_description=_KWARGS_DESCRIPTION,
67
+ features=datasets.Features(
68
+ {
69
+ "X": datasets.Sequence(datasets.Value("float", id="sequence"), id="X"),
70
+ }
71
+ ),
72
+ )
73
+
74
+ def _compute(self, X, reference_distribution):
75
+
76
+ # convert to numpy arrays
77
+ X = np.array(X)
78
+ reference_distribution = np.array(reference_distribution)
79
+
80
+ # Assert that arrays are 2D
81
+ if len(X.shape) != 2:
82
+ raise ValueError("Expected `X` to be a 2D vector")
83
+ if len(reference_distribution.shape) != 2:
84
+ raise ValueError("Expected `reference_distribution` to be a 2D vector")
85
+ if reference_distribution.shape[0] < 2:
86
+ raise ValueError(
87
+ "Expected `reference_distribution` to be a 2D vector with more than one element in the first dimension"
88
+ )
89
+
90
+ # Get mahalanobis distance for each prediction
91
+ X_minus_mu = X - np.mean(reference_distribution)
92
+ cov = np.cov(reference_distribution.T)
93
+ try:
94
+ inv_covmat = np.linalg.inv(cov)
95
+ except np.linalg.LinAlgError:
96
+ inv_covmat = np.linalg.pinv(cov)
97
+ left_term = np.dot(X_minus_mu, inv_covmat)
98
+ mahal_dist = np.dot(left_term, X_minus_mu.T).diagonal()
99
+
100
+ return {"mahalanobis": mahal_dist}
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # TODO: fix github to release
2
+ git+https://github.com/huggingface/evaluate.git@b6e6ed7f3e6844b297bff1b43a1b4be0709b9671
3
+ datasets~=2.0