Spaces:
Runtime error
Runtime error
Commit
·
619e946
1
Parent(s):
5199800
Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ This metric is used for evaluating the quality of relation extraction output. By
|
|
20 |
|
21 |
|
22 |
## Metric Description
|
23 |
-
This metric can be used in relation extraction evaluation.
|
24 |
|
25 |
## How to Use
|
26 |
This metric takes 2 inputs, prediction and references(ground truth). Both of them are a list of list of dictionary of entity's name and entity's type:
|
@@ -79,7 +79,7 @@ Output Example:
|
|
79 |
Remind : Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs.
|
80 |
|
81 |
### Examples
|
82 |
-
Example1 : only one prediction and reference, mode = strict,
|
83 |
```python
|
84 |
metric_path = "Ikala-allen/relation_extraction"
|
85 |
module = evaluate.load(metric_path)
|
@@ -133,7 +133,7 @@ print(evaluation_scores)
|
|
133 |
>>> {'tp': 2, 'fp': 0, 'fn': 1, 'p': 100.0, 'r': 66.66666666666667, 'f1': 80.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
|
134 |
```
|
135 |
|
136 |
-
Example3 :
|
137 |
```python
|
138 |
metric_path = "Ikala-allen/relation_extraction"
|
139 |
module = evaluate.load(metric_path)
|
@@ -168,34 +168,40 @@ print(evaluation_scores)
|
|
168 |
>>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
|
169 |
```
|
170 |
|
171 |
-
Example 4
|
172 |
```python
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
|
190 |
-
|
191 |
-
|
192 |
-
|
193 |
-
|
194 |
-
{'
|
|
|
|
|
|
|
|
|
|
|
195 |
```
|
196 |
|
197 |
## Limitations and Bias
|
198 |
-
This metric has strict
|
|
|
199 |
|
200 |
## Citation
|
201 |
```bibtex
|
|
|
20 |
|
21 |
|
22 |
## Metric Description
|
23 |
+
This metric can be used in relation extraction evaluation.
|
24 |
|
25 |
## How to Use
|
26 |
This metric takes 2 inputs, prediction and references(ground truth). Both of them are a list of list of dictionary of entity's name and entity's type:
|
|
|
79 |
Remind : Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs.
|
80 |
|
81 |
### Examples
|
82 |
+
Example1 : only one prediction and reference, mode = strict, only output ALL relation score
|
83 |
```python
|
84 |
metric_path = "Ikala-allen/relation_extraction"
|
85 |
module = evaluate.load(metric_path)
|
|
|
133 |
>>> {'tp': 2, 'fp': 0, 'fn': 1, 'p': 100.0, 'r': 66.66666666666667, 'f1': 80.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}
|
134 |
```
|
135 |
|
136 |
+
Example3 : two or more prediction and reference, mode = boundaries, only output = False, output all relation type score
|
137 |
```python
|
138 |
metric_path = "Ikala-allen/relation_extraction"
|
139 |
module = evaluate.load(metric_path)
|
|
|
168 |
>>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}}
|
169 |
```
|
170 |
|
171 |
+
Example 4 : two or more prediction and reference, mode = boundaries, only output = False, only output ALL relation score, relation_types = ["belongs_to"], only consider belongs_to type score
|
172 |
```python
|
173 |
+
metric_path = "Ikala-allen/relation_extraction"
|
174 |
+
module = evaluate.load(metric_path)
|
175 |
+
references = [
|
176 |
+
[
|
177 |
+
{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
178 |
+
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
179 |
+
],
|
180 |
+
[
|
181 |
+
{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
|
182 |
+
{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'},
|
183 |
+
]
|
184 |
+
]
|
185 |
+
|
186 |
+
# Example references (ground truth)
|
187 |
+
predictions = [
|
188 |
+
[
|
189 |
+
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
190 |
+
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
191 |
+
],
|
192 |
+
[
|
193 |
+
{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'},
|
194 |
+
{'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}
|
195 |
+
]
|
196 |
+
]
|
197 |
+
evaluation_scores = module.compute(predictions=predictions, references=references, mode = "boundaries", only_all=False,relation_types = ["belongs_to"])
|
198 |
+
print(evaluation_scores)
|
199 |
+
>>> {'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0, 'Macro_f1': 0.0, 'Macro_p': 0.0, 'Macro_r': 0.0}}
|
200 |
```
|
201 |
|
202 |
## Limitations and Bias
|
203 |
+
This metric has strict and boundaries mode, also can select relation_types for different type evaluation. Make sure to select suitable evaluation parameters. F1 score may be totally different.
|
204 |
+
Prediction and reference entity_name should be exactly the same regardless of case and spaces. If prediction is not exactly the same as the reference one. It will count as fp or fn.
|
205 |
|
206 |
## Citation
|
207 |
```bibtex
|