Spaces:
Running
Running
Update Space (evaluate main: 05209ece)
Browse files
README.md
CHANGED
@@ -10,6 +10,39 @@ pinned: false
|
|
10 |
tags:
|
11 |
- evaluate
|
12 |
- metric
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
## Metric description
|
|
|
10 |
tags:
|
11 |
- evaluate
|
12 |
- metric
|
13 |
+
description: >-
|
14 |
+
CoVal is a coreference evaluation tool for the CoNLL and ARRAU datasets which
|
15 |
+
implements of the common evaluation metrics including MUC [Vilain et al, 1995],
|
16 |
+
B-cubed [Bagga and Baldwin, 1998], CEAFe [Luo et al., 2005],
|
17 |
+
LEA [Moosavi and Strube, 2016] and the averaged CoNLL score
|
18 |
+
(the average of the F1 values of MUC, B-cubed and CEAFe)
|
19 |
+
[Denis and Baldridge, 2009a; Pradhan et al., 2011].
|
20 |
+
|
21 |
+
This wrapper of CoVal currently only work with CoNLL line format:
|
22 |
+
The CoNLL format has one word per line with all the annotation for this word in column separated by spaces:
|
23 |
+
Column Type Description
|
24 |
+
1 Document ID This is a variation on the document filename
|
25 |
+
2 Part number Some files are divided into multiple parts numbered as 000, 001, 002, ... etc.
|
26 |
+
3 Word number
|
27 |
+
4 Word itself This is the token as segmented/tokenized in the Treebank. Initially the *_skel file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release.
|
28 |
+
5 Part-of-Speech
|
29 |
+
6 Parse bit This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a *. The full parse can be created by substituting the asterix with the "([pos] [word])" string (or leaf) and concatenating the items in the rows of that column.
|
30 |
+
7 Predicate lemma The predicate lemma is mentioned for the rows for which we have semantic role information. All other rows are marked with a "-"
|
31 |
+
8 Predicate Frameset ID This is the PropBank frameset ID of the predicate in Column 7.
|
32 |
+
9 Word sense This is the word sense of the word in Column 3.
|
33 |
+
10 Speaker/Author This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data.
|
34 |
+
11 Named Entities These columns identifies the spans representing various named entities.
|
35 |
+
12:N Predicate Arguments There is one column each of predicate argument structure information for the predicate mentioned in Column 7.
|
36 |
+
N Coreference Coreference chain information encoded in a parenthesis structure.
|
37 |
+
More informations on the format can be found here (section "*_conll File Format"): http://www.conll.cemantix.org/2012/data.html
|
38 |
+
|
39 |
+
Details on the evaluation on CoNLL can be found here: https://github.com/ns-moosavi/coval/blob/master/conll/README.md
|
40 |
+
|
41 |
+
CoVal code was written by @ns-moosavi.
|
42 |
+
Some parts are borrowed from https://github.com/clarkkev/deep-coref/blob/master/evaluation.py
|
43 |
+
The test suite is taken from https://github.com/conll/reference-coreference-scorers/
|
44 |
+
Mention evaluation and the test suite are added by @andreasvc.
|
45 |
+
Parsing CoNLL files is developed by Leo Born.
|
46 |
---
|
47 |
|
48 |
## Metric description
|