Ubuntu commited on
Commit
840e204
·
2 Parent(s): bba257f 460c3b9

Merge branch 'main' of https://huggingface.co/Unbabel/wmt20-comet-da into main

Browse files
Files changed (1) hide show
  1. README.md +105 -98
README.md CHANGED
@@ -1,101 +1,104 @@
1
  ---
2
  pipeline_tag: translation
3
  language:
4
- - multilingual
5
- - af
6
- - am
7
- - ar
8
- - as
9
- - az
10
- - be
11
- - bg
12
- - bn
13
- - br
14
- - bs
15
- - ca
16
- - cs
17
- - cy
18
- - da
19
- - de
20
- - el
21
- - en
22
- - eo
23
- - es
24
- - et
25
- - eu
26
- - fa
27
- - fi
28
- - fr
29
- - fy
30
- - ga
31
- - gd
32
- - gl
33
- - gu
34
- - ha
35
- - he
36
- - hi
37
- - hr
38
- - hu
39
- - hy
40
- - id
41
- - is
42
- - it
43
- - ja
44
- - jv
45
- - ka
46
- - kk
47
- - km
48
- - kn
49
- - ko
50
- - ku
51
- - ky
52
- - la
53
- - lo
54
- - lt
55
- - lv
56
- - mg
57
- - mk
58
- - ml
59
- - mn
60
- - mr
61
- - ms
62
- - my
63
- - ne
64
- - nl
65
- - 'no'
66
- - om
67
- - or
68
- - pa
69
- - pl
70
- - ps
71
- - pt
72
- - ro
73
- - ru
74
- - sa
75
- - sd
76
- - si
77
- - sk
78
- - sl
79
- - so
80
- - sq
81
- - sr
82
- - su
83
- - sv
84
- - sw
85
- - ta
86
- - te
87
- - th
88
- - tl
89
- - tr
90
- - ug
91
- - uk
92
- - ur
93
- - uz
94
- - vi
95
- - xh
96
- - yi
97
- - zh
98
  license: apache-2.0
 
 
 
99
  ---
100
 
101
  This is a [COMET](https://github.com/Unbabel/COMET) evaluation model: It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
@@ -106,7 +109,11 @@ This is a [COMET](https://github.com/Unbabel/COMET) evaluation model: It receive
106
 
107
  [Unbabel’s Participation in the WMT20 Metrics Shared Task](https://aclanthology.org/2020.wmt-1.101) (Rei et al., WMT 2020)
108
 
109
- ## Usage (unbabel-comet)
 
 
 
 
110
 
111
  Using this model requires unbabel-comet to be installed:
112
 
@@ -138,16 +145,16 @@ model_output = model.predict(data, batch_size=8, gpus=1)
138
  print (model_output)
139
  ```
140
 
141
- ## Intended uses
142
 
143
  Our model is intented to be used for **MT evaluation**.
144
 
145
  Given a a triplet with (source sentence, translation, reference translation) outputs a single score. This score is unbounded but typically falls between -1 and 1 where 1 reflects a perfect translation.
146
 
147
- ## Languages Covered:
148
 
149
  This model builds on top of XLM-R which cover the following languages:
150
 
151
  Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
152
 
153
- Thus, results for language pairs containing uncovered languages are unreliable!
 
1
  ---
2
  pipeline_tag: translation
3
  language:
4
+ - multilingual
5
+ - af
6
+ - am
7
+ - ar
8
+ - as
9
+ - az
10
+ - be
11
+ - bg
12
+ - bn
13
+ - br
14
+ - bs
15
+ - ca
16
+ - cs
17
+ - cy
18
+ - da
19
+ - de
20
+ - el
21
+ - en
22
+ - eo
23
+ - es
24
+ - et
25
+ - eu
26
+ - fa
27
+ - fi
28
+ - fr
29
+ - fy
30
+ - ga
31
+ - gd
32
+ - gl
33
+ - gu
34
+ - ha
35
+ - he
36
+ - hi
37
+ - hr
38
+ - hu
39
+ - hy
40
+ - id
41
+ - is
42
+ - it
43
+ - ja
44
+ - jv
45
+ - ka
46
+ - kk
47
+ - km
48
+ - kn
49
+ - ko
50
+ - ku
51
+ - ky
52
+ - la
53
+ - lo
54
+ - lt
55
+ - lv
56
+ - mg
57
+ - mk
58
+ - ml
59
+ - mn
60
+ - mr
61
+ - ms
62
+ - my
63
+ - ne
64
+ - nl
65
+ - 'no'
66
+ - om
67
+ - or
68
+ - pa
69
+ - pl
70
+ - ps
71
+ - pt
72
+ - ro
73
+ - ru
74
+ - sa
75
+ - sd
76
+ - si
77
+ - sk
78
+ - sl
79
+ - so
80
+ - sq
81
+ - sr
82
+ - su
83
+ - sv
84
+ - sw
85
+ - ta
86
+ - te
87
+ - th
88
+ - tl
89
+ - tr
90
+ - ug
91
+ - uk
92
+ - ur
93
+ - uz
94
+ - vi
95
+ - xh
96
+ - yi
97
+ - zh
98
  license: apache-2.0
99
+ tags:
100
+ - arXiv:2010.15535
101
+ - PyTorch
102
  ---
103
 
104
  This is a [COMET](https://github.com/Unbabel/COMET) evaluation model: It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
 
109
 
110
  [Unbabel’s Participation in the WMT20 Metrics Shared Task](https://aclanthology.org/2020.wmt-1.101) (Rei et al., WMT 2020)
111
 
112
+ # License
113
+
114
+ Apache-2.0
115
+
116
+ # Usage (unbabel-comet)
117
 
118
  Using this model requires unbabel-comet to be installed:
119
 
 
145
  print (model_output)
146
  ```
147
 
148
+ # Intended uses
149
 
150
  Our model is intented to be used for **MT evaluation**.
151
 
152
  Given a a triplet with (source sentence, translation, reference translation) outputs a single score. This score is unbounded but typically falls between -1 and 1 where 1 reflects a perfect translation.
153
 
154
+ # Languages Covered:
155
 
156
  This model builds on top of XLM-R which cover the following languages:
157
 
158
  Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
159
 
160
+ Thus, results for language pairs containing uncovered languages are unreliable!