lvwerra HF staff commited on
Commit
98ea530
·
1 Parent(s): 120a2bb

Update Space (evaluate main: 83129c0f)

Browse files
Files changed (4) hide show
  1. README.md +71 -6
  2. app.py +6 -0
  3. charcut_mt.py +89 -0
  4. requirements.txt +2 -0
README.md CHANGED
@@ -1,12 +1,77 @@
1
  ---
2
- title: Charcut Mt
3
- emoji: 🌖
4
- colorFrom: gray
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 3.12.0
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: CharCut
3
+ emoji: 🔤
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.0.2
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ - machine-translation
14
+ description: >-
15
+ CharCut is a character-based machine translation evaluation metric.
16
  ---
17
 
18
+ # Metric Card for CharacTER
19
+
20
+ ## Metric Description
21
+ CharCut compares outputs of MT systems with reference translations. The matching algorithm is based on an iterative
22
+ search for longest common substrings, combined with a length-based threshold that limits short and noisy character
23
+ matches. As a similarity metric this is not new, but to the best of our knowledge it was never applied to highlighting
24
+ and scoring of MT outputs. It has the neat effect of keeping character-based differences readable by humans.
25
+
26
+ ## Intended Uses
27
+ CharCut was developed for machine translation evaluation.
28
+
29
+ ## How to Use
30
+
31
+ ```python
32
+ import evaluate
33
+ charcut = evaluate.load("charcut")
34
+ preds = ["this week the saudis denied information published in the new york times",
35
+ "this is in fact an estimate"]
36
+ refs = ["saudi arabia denied this week information published in the american new york times",
37
+ "this is actually an estimate"]
38
+ results = charcut.compute(references=refs, predictions=preds)
39
+ print(results)
40
+ # {'charcut_mt': 0.1971153846153846}
41
+
42
+ ```
43
+ ### Inputs
44
+ - **predictions**: a single prediction or a list of predictions to score. Each prediction should be a string with
45
+ tokens separated by spaces.
46
+ - **references**: a single reference or a list of reference for each prediction. Each reference should be a string with
47
+ tokens separated by spaces.
48
+
49
+
50
+ ### Output Values
51
+ - **charcut_mt**: the CharCut evaluation score (lower is better)
52
+
53
+ ### Output Example
54
+ ```python
55
+ {'charcut_mt': 0.1971153846153846}
56
+ ```
57
+
58
+ ## Citation
59
+ ```bibtex
60
+ @inproceedings{lardilleux-lepage-2017-charcut,
61
+ title = "{CHARCUT}: Human-Targeted Character-Based {MT} Evaluation with Loose Differences",
62
+ author = "Lardilleux, Adrien and
63
+ Lepage, Yves",
64
+ booktitle = "Proceedings of the 14th International Conference on Spoken Language Translation",
65
+ month = dec # " 14-15",
66
+ year = "2017",
67
+ address = "Tokyo, Japan",
68
+ publisher = "International Workshop on Spoken Language Translation",
69
+ url = "https://aclanthology.org/2017.iwslt-1.20",
70
+ pages = "146--153",
71
+ abstract = "We present CHARCUT, a character-based machine translation evaluation metric derived from a human-targeted segment difference visualisation algorithm. It combines an iterative search for longest common substrings between the candidate and the reference translation with a simple length-based threshold, enabling loose differences that limit noisy character matches. Its main advantage is to produce scores that directly reflect human-readable string differences, making it a useful support tool for the manual analysis of MT output and its display to end users. Experiments on WMT16 metrics task data show that it is on par with the best {``}un-trained{''} metrics in terms of correlation with human judgement, well above BLEU and TER baselines, on both system and segment tasks.",
72
+ }
73
+ ```
74
+
75
+ ## Further References
76
+ - Repackaged version that is used in this HF implementation: [https://github.com/BramVanroy/CharCut](https://github.com/BramVanroy/CharCut)
77
+ - Original version: [https://github.com/alardill/CharCut](https://github.com/alardill/CharCut)
app.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import evaluate
2
+ from evaluate.utils import launch_gradio_widget
3
+
4
+
5
+ module = evaluate.load("charcut_mt")
6
+ launch_gradio_widget(module)
charcut_mt.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ """An implementation for calculating CharCut, a character-based machine translation evaluation metric."""
15
+ from typing import Iterable, Union
16
+
17
+ import datasets
18
+ from charcut import calculate_charcut
19
+ from datasets import Sequence, Value
20
+
21
+ import evaluate
22
+
23
+
24
+ _CITATION = """\
25
+ @inproceedings{lardilleux-lepage-2017-charcut,
26
+ title = "{CHARCUT}: Human-Targeted Character-Based {MT} Evaluation with Loose Differences",
27
+ author = "Lardilleux, Adrien and
28
+ Lepage, Yves",
29
+ booktitle = "Proceedings of the 14th International Conference on Spoken Language Translation",
30
+ month = dec # " 14-15",
31
+ year = "2017",
32
+ address = "Tokyo, Japan",
33
+ publisher = "International Workshop on Spoken Language Translation",
34
+ url = "https://aclanthology.org/2017.iwslt-1.20",
35
+ pages = "146--153"
36
+ }
37
+ """
38
+
39
+ _DESCRIPTION = """\
40
+ CharCut compares outputs of MT systems with reference translations. The matching algorithm is based on an iterative
41
+ search for longest common substrings, combined with a length-based threshold that limits short and noisy character
42
+ matches. As a similarity metric this is not new, but to the best of our knowledge it was never applied to highlighting
43
+ and scoring of MT outputs. It has the neat effect of keeping character-based differences readable by humans."""
44
+
45
+ _KWARGS_DESCRIPTION = """
46
+ Calculates how good predictions are given some references.
47
+ Args:
48
+ predictions: a list of predictions to score. Each prediction should be a string with
49
+ tokens separated by spaces.
50
+ references: a list of reference for each prediction. Each reference should be a string with
51
+ tokens separated by spaces.
52
+ Returns:
53
+ charcut_mt: the CharCut score
54
+ Examples:
55
+ >>> charcut_mt = evaluate.load("charcut_mt")
56
+ >>> preds = ["this week the saudis denied information published in the new york times",
57
+ ... "this is in fact an estimate"]
58
+ >>> refs = ["saudi arabia denied this week information published in the american new york times",
59
+ ... "this is actually an estimate"]
60
+ >>> charcut_mt.compute(references=refs, predictions=preds)
61
+ {'charcut_mt': 0.1971153846153846}
62
+ """
63
+
64
+
65
+ @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
66
+ class Charcut(evaluate.Metric):
67
+ """Character-based MT evaluation."""
68
+
69
+ def _info(self):
70
+ return evaluate.MetricInfo(
71
+ # This is the description that will appear on the modules page.
72
+ module_type="metric",
73
+ description=_DESCRIPTION,
74
+ citation=_CITATION,
75
+ inputs_description=_KWARGS_DESCRIPTION,
76
+ # This defines the format of each prediction and reference
77
+ features=[
78
+ datasets.Features(
79
+ {"predictions": Value("string", id="prediction"), "references": Value("string", id="reference")}
80
+ ),
81
+ ],
82
+ # Homepage of the module for documentation
83
+ homepage="https://github.com/BramVanroy/CharCut",
84
+ # Additional links to the codebase or references
85
+ codebase_urls=["https://github.com/BramVanroy/CharCut", "https://github.com/alardill/CharCut"],
86
+ )
87
+
88
+ def _compute(self, predictions: Iterable[str], references: Iterable[str]):
89
+ return {"charcut_mt": calculate_charcut(predictions, references)[0]}
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ git+https://github.com/huggingface/evaluate@83129c0ff9053422f1031aa12d0c837ec6ff9b56
2
+ charcut>=1.1.1