File size: 13,180 Bytes
c45d283 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
mtool
=====
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Flag_of_Switzerland.svg/240px-Flag_of_Switzerland.svg.png" width=20> **The Swiss Army Knife of Meaning Representation**
This repository provides software to support participants in the
shared tasks on [Meaning Representation Parsing (MRP)](http://mrp.nlpl.eu)
at the
[2019](http://www.conll.org/2019) and
[2020 Conference on Computational Natural Language Learning](http://www.conll.org/2020) (CoNLL).
Please see the above task web site for additional background.
Scoring
-------
`mtool` implements the official MRP 2019 cross-framwork metric, as well as
a range of framework-specific graph similarity metrics, viz.
+ MRP (Maximum Common Edge Subgraph Isomorphism);
+ EDM (Elementary Dependency Match; [Dridan & Oepen, 2011](http://aclweb.org/anthology/W/W11/W11-2927.pdf));
+ SDP Labeled and Unlabeled Dependency F1 ([Oepen et al., 2015](http://aclweb.org/anthology/S/S14/S14-2008.pdf));
+ SMATCH Precision, Recall, and F1 ([Cai & Knight, 2013](http://www.aclweb.org/anthology/P13-2131));
+ UCCA Labeled and Unlabeled Dependency F1 ([Hershcovich et al., 2019](https://www.aclweb.org/anthology/S19-2001)).
The ‘official’ cross-framework metric for the MRP 2019 shared task is a generalization
of the framework-specific metrics, considering all applicable ‘pieces of information’ (i.e.
tuples representing basic structural elements) for each framework:
1. top nodes;
2. node labels;
3. node properties;
4. node anchoring;
5. directed edges;
6. edge labels; and
7. edge attributes.
When comparing two graphs, node-to-node correspondences need to be established (via a
potentially approximative search) to maximize the aggregate, unweighted score of all of the tuple
types that apply for each specific framework.
Directed edges and edge labels, however, are always considered in conjunction during
this search.
```
./main.py --read mrp --score mrp --gold data/sample/eds/wsj.mrp data/score/eds/wsj.pet.mrp
{"n": 87,
"tops": {"g": 87, "s": 87, "c": 85, "p": 0.9770114942528736, "r": 0.9770114942528736, "f": 0.9770114942528736},
"labels": {"g": 2500, "s": 2508, "c": 2455, "p": 0.9788676236044657, "r": 0.982, "f": 0.9804313099041533},
"properties": {"g": 262, "s": 261, "c": 257, "p": 0.9846743295019157, "r": 0.9809160305343512, "f": 0.982791586998088},
"anchors": {"g": 2500, "s": 2508, "c": 2430, "p": 0.9688995215311005, "r": 0.972, "f": 0.9704472843450479},
"edges": {"g": 2432, "s": 2439, "c": 2319, "p": 0.95079950799508, "r": 0.9535361842105263, "f": 0.952165879696161},
"attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
"all": {"g": 7781, "s": 7803, "c": 7546, "p": 0.9670639497629117, "r": 0.9697982264490426, "f": 0.9684291581108829}}
```
Albeit originally defined for one specific framework (EDS, DM and PSD, AMR, or UCCA, respectively),
the pre-MRP metrics are to some degree applicable to other frameworks too: the unified MRP representation
of semantic graphs enables such cross-framework application, in principle, but this functionality
remains largely untested (as of June 2019).
The `Makefile` in the `data/score/` sub-directory shows some example calls for the MRP scorer.
As appropriate (e.g. for comparison to third-party results), it is possible to score graphs in
each framework using its ‘own’ metric, for example (for AMR and UCCA, respectively):
```
./main.py --read mrp --score smatch --gold data/score/amr/test1.mrp data/score/amr/test2.mrp
{"n": 3, "g": 30, "s": 29, "c": 24, "p": 0.8, "r": 0.8275862068965517, "f": 0.8135593220338982}
```
```
./main.py --read mrp --score ucca --gold data/score/ucca/ewt.gold.mrp data/score/ucca/ewt.tupa.mrp
{"n": 3757,
"labeled":
{"primary": {"g": 63720, "s": 62876, "c": 38195,
"p": 0.6074654876264394, "r": 0.5994193345888261, "f": 0.6034155897500711},
"remote": {"g": 2673, "s": 1259, "c": 581,
"p": 0.4614773629864972, "r": 0.21735877291432848, "f": 0.2955239064089522}},
"unlabeled":
{"primary": {"g": 56114, "s": 55761, "c": 52522,
"p": 0.9419128064417783, "r": 0.9359874541112735, "f": 0.938940782122905},
"remote": {"g": 2629, "s": 1248, "c": 595,
"p": 0.47676282051282054, "r": 0.22632179535945227, "f": 0.3069383543977302}}}
```
For all scorers, the `--trace` command-line option will enable per-item scores in the result
(indexed by frameworks and graph identifiers).
For MRP and SMATCH, the `--limit` option controls the maximum node pairing steps or
hill-climbing iterations, respectively, to attempt during the search (with defaults `500000`
and `20`, respectively).
As of early July, 2019, the search for none-to-node correspondences in the MRP metric can be
initialized from the result of the random-restart hill-climbing (RRHC) search from SMATCH.
This initialization is on by default; it increases running time of the MRP scorer but yields
a guarantee that the `"all"` counts of matching tuples in MRP will always be at least as
high as the number of `"c"`(orrect) tuples identified by SMATCH.
To control the two search steps in MRP computation separately, the `--limit` option can
take a colon-separated pair of integers, for example `5:100000` for five hill-climbing
iterations and up to 100,000 node pairing steps.
Note that multi-valued use of the `--limit` option is only meaningful in conjunction
with the MRP metric, and that setting either of the two values to `0` will disable the
corresponding search component.
Finally, the MRP scorer can parallelize evaluation: an option like `--cores 8` (on
suitable hardware) will run eight `mtool` processes in parallel, which should reduce
scoring time substantially.
Analytics
---------
[Kuhlmann & Oepen (2016)](http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00268) discuss a range of structural graph statistics; `mtool` integrates their original code, e.g.
```
./main.py --read mrp --analyze data/sample/amr/wsj.mrp
(01) number of graphs 87
(02) number of edge labels 52
(03) \percentgraph\ trees 51.72
(04) \percentgraph\ treewidth one 51.72
(05) average treewidth 1.494
(06) maximal treewidth 3
(07) average edge density 1.050
(08) \percentnode\ reentrant 4.24
(09) \percentgraph\ cyclic 13.79
(10) \percentgraph\ not connected 0.00
(11) \percentgraph\ multi-rooted 0.00
(12) percentage of non-top roots 0.00
(13) average edge length --
(14) \percentgraph\ noncrossing --
(15) \percentgraph\ pagenumber two --
```
Validation
----------
`mtool` can test high-level wellformedness and (superficial) plausiblity of MRP
graphs through its emerging `--validate` option.
The MRP validator continues to evolve, but the following is indicative of its
functionality:
```
./main.py --read mrp --validate all data/validate/eds/wsj.mrp
validate(): graph ‘20001001’: missing or invalid ‘input’ property
validate(): graph ‘20001001’; node #0: missing or invalid label
validate(): graph ‘20001001’; node #1: missing or invalid label
validate(): graph ‘20001001’; node #3: missing or invalid anchoring
validate(): graph ‘20001001’; node #6: invalid ‘anchors’ value: [{'from': 15, 'to': 23}, {'from': 15, 'to': 23}]
validate(): graph ‘20001001’; node #7: invalid ‘anchors’ value: [{'form': 15, 'to': 17}]
```
Conversion
----------
Among its options for format coversion, `mtool` supports output of graphs to the
[DOT language](https://www.graphviz.org/documentation/) for graph visualization, e.g.
```
./main.py --id 20001001 --read mrp --write dot data/sample/eds/wsj.mrp 20001001.dot
dot -Tpdf 20001001.dot > 20001001.pdf
```
When converting from token-based file formats that may lack either the underlying
‘raw’ input string, character-based anchoring, or both, the `--text` command-line
option will enable recovery of inputs and attempt to determine anchoring.
Its argument must be a file containing pairs of identifiers and input strings, one
per line, separated by a tabulator, e.g.
```
./main.py --id 20012005 --text data/sample/wsj.txt --read dm --write dot data/sample/psd/wsj.sdp 20012005.dot
```
For increased readability, the `--ids` option will include MRP node identifiers
in graph rendering, and the `--strings` option can replace character-based
anchors with the corresponding sub-string from the `input` field of the graph
(currently only for the DOT output format), e.g.
```
./main.py --n 1 --strings --read mrp --write dot data/sample/ucca/wsj.mrp vinken.dot
```
Diagnostics
--------------
When scoring with the MRP metric, `mtool` can optionally provide a per-item
breakdown of differences between the gold and the system graphs, i.e. record
false negatives (‘missing’ tuples) and false positives (‘surplus’ ones).
This functionality is activated via the `--errors` command-line option, and
tuple mismatches between the two graphs are recorded as a hierarchically
nested JSON object, indexed (in order) by framework, item identifier, and tuple
type.
For example:
```
./main.py --read mrp --score mrp --framework eds --gold data/score/lpps.mrp --errors errors.json data/score/eds/lpps.peking.mrp
```
For the first EDS item (`#102990`) in this comparison, `errors.json` will
contain a sub-structure like the following:
```
{"correspondences": [[0, 0], [1, 1], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11],
[11, 12], [12, 13], [13, 15], [14, 16], [15, 17], [16, 14], [17, 18], [18, 19], [19, 20]],
"labels": {"missing": [[2, "_very+much_a_1"]],
"surplus": [[3, "_much_x_deg"], [2, "_very_x_deg"]]},
"anchors": {"missing": [[2, [6, 7, 8, 9, 11, 12, 13, 14]]],
"surplus": [[2, [6, 7, 8, 9]], [3, [11, 12, 13, 14]]]},
"edges": {"surplus": [[2, 3, "arg1"]]}}
```
When interpreting this structure, there are (of course) two separate spaces of
node identifiers; the `correspondences` vector records the (optimal)
node-to-node relation found by the MRP scorer, pairing identifiers from the
*gold* graph with corresponding identifiers in the *system* graph.
In the above, for example, gold node `#2` corresponds to system node `#3`,
and there is a spurious node `#2` in the example system graph, which
does not correspond to any of the gold nodes.
Node identifiers in `"missing"` entries refer to gold nodes, whereas
identifiers in `"surplus"` entries refer to the system graph, and they may
or may not stand in a correspondence relation to a gold node.
The differences between these two graphs can be visualized as follows, color-coding
false negatives in red, and false positives in blue
(and using gold identifiers, where available).
![sample visualization](https://github.com/cfmrp/mtool/blob/master/data/score/eds/lpps.102990.png)
Common Options
--------------
The `--read` and `--write` command-line options determine the input and output
codecs to use.
Valid input arguments include `mrp`, `amr`, `ccd`, `dm`, `eds`, `pas`, `psd`, `ud`, `eud`,
and `ucca`; note that some of these formats are only [partially supported](https://github.com/cfmrp/mtool/issues).
The range of supported output codecs includes `mrp`, `dot`, or `txt`.
The optional `--id`, `--i`, or `--n` options control which graph(s)
from the input file(s) to process, selecting either by identifier, by (zero-based)
position into the sequence of graphs read from the file, or using the first _n_
graphs.
These options cannot be combined with each other and take precendence over each
other in the above order.
Another way of selecting only a subset of graphs (from both the gold and
system inputs) is the `--framework` option, which will limit the selection
to graphs with matching `"framework"` values.
Finally, the `--unique` option will discard graphs with multiple occurences
of the same identifier, keeping only the first occurence from the input stream.
Most top-level graph properties (`"id"`, `"time"`, `"source"`, `"provenance"`,
`"language"`, `"flavor"`, `"framework"`, `"targets"`, `"input"`) can be set
(or destructively overwritten, upon completion of input processing) using the
`--inject` option, which takes as its argument a JSON object, e.g.
```
./main.py --text wsj.txt --read eds \
--inject '{"source": "wsj", "provenance": "Redwoods Ninth Growth (ERG 1214)"}' \
--write mrp wsj.eds wsj.mrp
```
Installation
------------
You can install `mtool` via `pip` with the following command:
```
pip install git+https://github.com/cfmrp/mtool.git#egg=mtool
```
Authors
-------
+ Daniel Hershcovich <[email protected]> (@danielhers)
+ Marco Kuhlmann <[email protected]> (@khlmnn)
+ Stephan Oepen <[email protected]> (@oepen)
+ Tim O'Gorman <[email protected]> (@timjogorman)
Contributors
------------
+ Yuta Koreeda <[email protected]> (@koreyou)
+ Matthias Lindemann <[email protected]> (@namednil)
+ Hiroaki Ozaki <[email protected]> (@taryou)
+ Milan Straka <[email protected]> (@foxik)
[![Build Status (Travis CI)](https://travis-ci.org/cfmrp/mtool.svg?branch=master)](https://travis-ci.org/cfmrp/mtool)
[![Build Status (AppVeyor)](https://ci.appveyor.com/api/projects/status/github/cfmrp/mtool?svg=true)](https://ci.appveyor.com/project/danielh/mtool)
|