tiedeman commited on
Commit
23221e0
·
1 Parent(s): a2d4eb7

Initial commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.spm filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,1085 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - aa
5
+ - am
6
+ - ar
7
+ - arc
8
+ - bcw
9
+ - byn
10
+ - cop
11
+ - daa
12
+ - de
13
+ - dsh
14
+ - en
15
+ - es
16
+ - fr
17
+ - gde
18
+ - gnd
19
+ - ha
20
+ - hbo
21
+ - he
22
+ - hig
23
+ - irk
24
+ - jpa
25
+ - kab
26
+ - ker
27
+ - kqp
28
+ - ktb
29
+ - kxc
30
+ - lln
31
+ - lme
32
+ - meq
33
+ - mfh
34
+ - mfi
35
+ - mfk
36
+ - mif
37
+ - mpg
38
+ - mqb
39
+ - mt
40
+ - muy
41
+ - oar
42
+ - om
43
+ - pbi
44
+ - phn
45
+ - pt
46
+ - rif
47
+ - sgw
48
+ - shi
49
+ - shy
50
+ - so
51
+ - sur
52
+ - syc
53
+ - syr
54
+ - taq
55
+ - thv
56
+ - ti
57
+ - tig
58
+ - tmc
59
+ - tmh
60
+ - tmr
61
+ - ttr
62
+ - tzm
63
+ - wal
64
+ - xed
65
+ - zgh
66
+
67
+ tags:
68
+ - translation
69
+ - opus-mt-tc-bible
70
+
71
+ license: apache-2.0
72
+ model-index:
73
+ - name: opus-mt-tc-bible-big-deu_eng_fra_por_spa-afa
74
+ results:
75
+ - task:
76
+ name: Translation deu-hau
77
+ type: translation
78
+ args: deu-hau
79
+ dataset:
80
+ name: flores200-devtest
81
+ type: flores200-devtest
82
+ args: deu-hau
83
+ metrics:
84
+ - name: BLEU
85
+ type: bleu
86
+ value: 11.4
87
+ - name: chr-F
88
+ type: chrf
89
+ value: 0.40471
90
+ - task:
91
+ name: Translation deu-heb
92
+ type: translation
93
+ args: deu-heb
94
+ dataset:
95
+ name: flores200-devtest
96
+ type: flores200-devtest
97
+ args: deu-heb
98
+ metrics:
99
+ - name: BLEU
100
+ type: bleu
101
+ value: 18.1
102
+ - name: chr-F
103
+ type: chrf
104
+ value: 0.48645
105
+ - task:
106
+ name: Translation deu-mlt
107
+ type: translation
108
+ args: deu-mlt
109
+ dataset:
110
+ name: flores200-devtest
111
+ type: flores200-devtest
112
+ args: deu-mlt
113
+ metrics:
114
+ - name: BLEU
115
+ type: bleu
116
+ value: 17.5
117
+ - name: chr-F
118
+ type: chrf
119
+ value: 0.54079
120
+ - task:
121
+ name: Translation eng-arz
122
+ type: translation
123
+ args: eng-arz
124
+ dataset:
125
+ name: flores200-devtest
126
+ type: flores200-devtest
127
+ args: eng-arz
128
+ metrics:
129
+ - name: BLEU
130
+ type: bleu
131
+ value: 11.1
132
+ - name: chr-F
133
+ type: chrf
134
+ value: 0.42804
135
+ - task:
136
+ name: Translation eng-hau
137
+ type: translation
138
+ args: eng-hau
139
+ dataset:
140
+ name: flores200-devtest
141
+ type: flores200-devtest
142
+ args: eng-hau
143
+ metrics:
144
+ - name: BLEU
145
+ type: bleu
146
+ value: 20.4
147
+ - name: chr-F
148
+ type: chrf
149
+ value: 0.49023
150
+ - task:
151
+ name: Translation eng-heb
152
+ type: translation
153
+ args: eng-heb
154
+ dataset:
155
+ name: flores200-devtest
156
+ type: flores200-devtest
157
+ args: eng-heb
158
+ metrics:
159
+ - name: BLEU
160
+ type: bleu
161
+ value: 27.1
162
+ - name: chr-F
163
+ type: chrf
164
+ value: 0.56635
165
+ - task:
166
+ name: Translation eng-mlt
167
+ type: translation
168
+ args: eng-mlt
169
+ dataset:
170
+ name: flores200-devtest
171
+ type: flores200-devtest
172
+ args: eng-mlt
173
+ metrics:
174
+ - name: BLEU
175
+ type: bleu
176
+ value: 34.9
177
+ - name: chr-F
178
+ type: chrf
179
+ value: 0.68334
180
+ - task:
181
+ name: Translation fra-hau
182
+ type: translation
183
+ args: fra-hau
184
+ dataset:
185
+ name: flores200-devtest
186
+ type: flores200-devtest
187
+ args: fra-hau
188
+ metrics:
189
+ - name: BLEU
190
+ type: bleu
191
+ value: 13.2
192
+ - name: chr-F
193
+ type: chrf
194
+ value: 0.42731
195
+ - task:
196
+ name: Translation fra-heb
197
+ type: translation
198
+ args: fra-heb
199
+ dataset:
200
+ name: flores200-devtest
201
+ type: flores200-devtest
202
+ args: fra-heb
203
+ metrics:
204
+ - name: BLEU
205
+ type: bleu
206
+ value: 19.1
207
+ - name: chr-F
208
+ type: chrf
209
+ value: 0.49683
210
+ - task:
211
+ name: Translation fra-mlt
212
+ type: translation
213
+ args: fra-mlt
214
+ dataset:
215
+ name: flores200-devtest
216
+ type: flores200-devtest
217
+ args: fra-mlt
218
+ metrics:
219
+ - name: BLEU
220
+ type: bleu
221
+ value: 20.4
222
+ - name: chr-F
223
+ type: chrf
224
+ value: 0.56844
225
+ - task:
226
+ name: Translation por-hau
227
+ type: translation
228
+ args: por-hau
229
+ dataset:
230
+ name: flores200-devtest
231
+ type: flores200-devtest
232
+ args: por-hau
233
+ metrics:
234
+ - name: BLEU
235
+ type: bleu
236
+ value: 13.6
237
+ - name: chr-F
238
+ type: chrf
239
+ value: 0.42593
240
+ - task:
241
+ name: Translation por-heb
242
+ type: translation
243
+ args: por-heb
244
+ dataset:
245
+ name: flores200-devtest
246
+ type: flores200-devtest
247
+ args: por-heb
248
+ metrics:
249
+ - name: BLEU
250
+ type: bleu
251
+ value: 19.7
252
+ - name: chr-F
253
+ type: chrf
254
+ value: 0.50345
255
+ - task:
256
+ name: Translation por-mlt
257
+ type: translation
258
+ args: por-mlt
259
+ dataset:
260
+ name: flores200-devtest
261
+ type: flores200-devtest
262
+ args: por-mlt
263
+ metrics:
264
+ - name: BLEU
265
+ type: bleu
266
+ value: 21.5
267
+ - name: chr-F
268
+ type: chrf
269
+ value: 0.58913
270
+ - task:
271
+ name: Translation spa-heb
272
+ type: translation
273
+ args: spa-heb
274
+ dataset:
275
+ name: flores200-devtest
276
+ type: flores200-devtest
277
+ args: spa-heb
278
+ metrics:
279
+ - name: BLEU
280
+ type: bleu
281
+ value: 13.5
282
+ - name: chr-F
283
+ type: chrf
284
+ value: 0.45249
285
+ - task:
286
+ name: Translation spa-mlt
287
+ type: translation
288
+ args: spa-mlt
289
+ dataset:
290
+ name: flores200-devtest
291
+ type: flores200-devtest
292
+ args: spa-mlt
293
+ metrics:
294
+ - name: BLEU
295
+ type: bleu
296
+ value: 12.7
297
+ - name: chr-F
298
+ type: chrf
299
+ value: 0.51077
300
+ - task:
301
+ name: Translation deu-ara
302
+ type: translation
303
+ args: deu-ara
304
+ dataset:
305
+ name: flores101-devtest
306
+ type: flores_101
307
+ args: deu ara devtest
308
+ metrics:
309
+ - name: BLEU
310
+ type: bleu
311
+ value: 15.7
312
+ - name: chr-F
313
+ type: chrf
314
+ value: 0.47927
315
+ - task:
316
+ name: Translation deu-hau
317
+ type: translation
318
+ args: deu-hau
319
+ dataset:
320
+ name: flores101-devtest
321
+ type: flores_101
322
+ args: deu hau devtest
323
+ metrics:
324
+ - name: BLEU
325
+ type: bleu
326
+ value: 10.6
327
+ - name: chr-F
328
+ type: chrf
329
+ value: 0.39583
330
+ - task:
331
+ name: Translation eng-hau
332
+ type: translation
333
+ args: eng-hau
334
+ dataset:
335
+ name: flores101-devtest
336
+ type: flores_101
337
+ args: eng hau devtest
338
+ metrics:
339
+ - name: BLEU
340
+ type: bleu
341
+ value: 19.0
342
+ - name: chr-F
343
+ type: chrf
344
+ value: 0.47807
345
+ - task:
346
+ name: Translation eng-mlt
347
+ type: translation
348
+ args: eng-mlt
349
+ dataset:
350
+ name: flores101-devtest
351
+ type: flores_101
352
+ args: eng mlt devtest
353
+ metrics:
354
+ - name: BLEU
355
+ type: bleu
356
+ value: 32.9
357
+ - name: chr-F
358
+ type: chrf
359
+ value: 0.67196
360
+ - task:
361
+ name: Translation fra-mlt
362
+ type: translation
363
+ args: fra-mlt
364
+ dataset:
365
+ name: flores101-devtest
366
+ type: flores_101
367
+ args: fra mlt devtest
368
+ metrics:
369
+ - name: BLEU
370
+ type: bleu
371
+ value: 19.9
372
+ - name: chr-F
373
+ type: chrf
374
+ value: 0.56271
375
+ - task:
376
+ name: Translation por-heb
377
+ type: translation
378
+ args: por-heb
379
+ dataset:
380
+ name: flores101-devtest
381
+ type: flores_101
382
+ args: por heb devtest
383
+ metrics:
384
+ - name: BLEU
385
+ type: bleu
386
+ value: 19.6
387
+ - name: chr-F
388
+ type: chrf
389
+ value: 0.49378
390
+ - task:
391
+ name: Translation spa-ara
392
+ type: translation
393
+ args: spa-ara
394
+ dataset:
395
+ name: flores101-devtest
396
+ type: flores_101
397
+ args: spa ara devtest
398
+ metrics:
399
+ - name: BLEU
400
+ type: bleu
401
+ value: 11.7
402
+ - name: chr-F
403
+ type: chrf
404
+ value: 0.44988
405
+ - task:
406
+ name: Translation deu-hau
407
+ type: translation
408
+ args: deu-hau
409
+ dataset:
410
+ name: ntrex128
411
+ type: ntrex128
412
+ args: deu-hau
413
+ metrics:
414
+ - name: BLEU
415
+ type: bleu
416
+ value: 12.5
417
+ - name: chr-F
418
+ type: chrf
419
+ value: 0.41931
420
+ - task:
421
+ name: Translation deu-heb
422
+ type: translation
423
+ args: deu-heb
424
+ dataset:
425
+ name: ntrex128
426
+ type: ntrex128
427
+ args: deu-heb
428
+ metrics:
429
+ - name: BLEU
430
+ type: bleu
431
+ value: 13.3
432
+ - name: chr-F
433
+ type: chrf
434
+ value: 0.43961
435
+ - task:
436
+ name: Translation deu-mlt
437
+ type: translation
438
+ args: deu-mlt
439
+ dataset:
440
+ name: ntrex128
441
+ type: ntrex128
442
+ args: deu-mlt
443
+ metrics:
444
+ - name: BLEU
445
+ type: bleu
446
+ value: 15.1
447
+ - name: chr-F
448
+ type: chrf
449
+ value: 0.49871
450
+ - task:
451
+ name: Translation eng-hau
452
+ type: translation
453
+ args: eng-hau
454
+ dataset:
455
+ name: ntrex128
456
+ type: ntrex128
457
+ args: eng-hau
458
+ metrics:
459
+ - name: BLEU
460
+ type: bleu
461
+ value: 23.2
462
+ - name: chr-F
463
+ type: chrf
464
+ value: 0.51601
465
+ - task:
466
+ name: Translation eng-heb
467
+ type: translation
468
+ args: eng-heb
469
+ dataset:
470
+ name: ntrex128
471
+ type: ntrex128
472
+ args: eng-heb
473
+ metrics:
474
+ - name: BLEU
475
+ type: bleu
476
+ value: 20.3
477
+ - name: chr-F
478
+ type: chrf
479
+ value: 0.50625
480
+ - task:
481
+ name: Translation eng-mlt
482
+ type: translation
483
+ args: eng-mlt
484
+ dataset:
485
+ name: ntrex128
486
+ type: ntrex128
487
+ args: eng-mlt
488
+ metrics:
489
+ - name: BLEU
490
+ type: bleu
491
+ value: 29.0
492
+ - name: chr-F
493
+ type: chrf
494
+ value: 0.62552
495
+ - task:
496
+ name: Translation eng-som
497
+ type: translation
498
+ args: eng-som
499
+ dataset:
500
+ name: ntrex128
501
+ type: ntrex128
502
+ args: eng-som
503
+ metrics:
504
+ - name: BLEU
505
+ type: bleu
506
+ value: 13.5
507
+ - name: chr-F
508
+ type: chrf
509
+ value: 0.46845
510
+ - task:
511
+ name: Translation fra-hau
512
+ type: translation
513
+ args: fra-hau
514
+ dataset:
515
+ name: ntrex128
516
+ type: ntrex128
517
+ args: fra-hau
518
+ metrics:
519
+ - name: BLEU
520
+ type: bleu
521
+ value: 14.5
522
+ - name: chr-F
523
+ type: chrf
524
+ value: 0.43729
525
+ - task:
526
+ name: Translation fra-heb
527
+ type: translation
528
+ args: fra-heb
529
+ dataset:
530
+ name: ntrex128
531
+ type: ntrex128
532
+ args: fra-heb
533
+ metrics:
534
+ - name: BLEU
535
+ type: bleu
536
+ value: 13.9
537
+ - name: chr-F
538
+ type: chrf
539
+ value: 0.43855
540
+ - task:
541
+ name: Translation fra-mlt
542
+ type: translation
543
+ args: fra-mlt
544
+ dataset:
545
+ name: ntrex128
546
+ type: ntrex128
547
+ args: fra-mlt
548
+ metrics:
549
+ - name: BLEU
550
+ type: bleu
551
+ value: 17.3
552
+ - name: chr-F
553
+ type: chrf
554
+ value: 0.51640
555
+ - task:
556
+ name: Translation por-hau
557
+ type: translation
558
+ args: por-hau
559
+ dataset:
560
+ name: ntrex128
561
+ type: ntrex128
562
+ args: por-hau
563
+ metrics:
564
+ - name: BLEU
565
+ type: bleu
566
+ value: 15.1
567
+ - name: chr-F
568
+ type: chrf
569
+ value: 0.44408
570
+ - task:
571
+ name: Translation por-heb
572
+ type: translation
573
+ args: por-heb
574
+ dataset:
575
+ name: ntrex128
576
+ type: ntrex128
577
+ args: por-heb
578
+ metrics:
579
+ - name: BLEU
580
+ type: bleu
581
+ value: 15.0
582
+ - name: chr-F
583
+ type: chrf
584
+ value: 0.45739
585
+ - task:
586
+ name: Translation por-mlt
587
+ type: translation
588
+ args: por-mlt
589
+ dataset:
590
+ name: ntrex128
591
+ type: ntrex128
592
+ args: por-mlt
593
+ metrics:
594
+ - name: BLEU
595
+ type: bleu
596
+ value: 18.2
597
+ - name: chr-F
598
+ type: chrf
599
+ value: 0.53719
600
+ - task:
601
+ name: Translation spa-hau
602
+ type: translation
603
+ args: spa-hau
604
+ dataset:
605
+ name: ntrex128
606
+ type: ntrex128
607
+ args: spa-hau
608
+ metrics:
609
+ - name: BLEU
610
+ type: bleu
611
+ value: 14.8
612
+ - name: chr-F
613
+ type: chrf
614
+ value: 0.44695
615
+ - task:
616
+ name: Translation spa-heb
617
+ type: translation
618
+ args: spa-heb
619
+ dataset:
620
+ name: ntrex128
621
+ type: ntrex128
622
+ args: spa-heb
623
+ metrics:
624
+ - name: BLEU
625
+ type: bleu
626
+ value: 14.5
627
+ - name: chr-F
628
+ type: chrf
629
+ value: 0.45509
630
+ - task:
631
+ name: Translation spa-mlt
632
+ type: translation
633
+ args: spa-mlt
634
+ dataset:
635
+ name: ntrex128
636
+ type: ntrex128
637
+ args: spa-mlt
638
+ metrics:
639
+ - name: BLEU
640
+ type: bleu
641
+ value: 17.7
642
+ - name: chr-F
643
+ type: chrf
644
+ value: 0.53631
645
+ - task:
646
+ name: Translation deu-ara
647
+ type: translation
648
+ args: deu-ara
649
+ dataset:
650
+ name: tatoeba-test-v2021-08-07
651
+ type: tatoeba_mt
652
+ args: deu-ara
653
+ metrics:
654
+ - name: BLEU
655
+ type: bleu
656
+ value: 20.2
657
+ - name: chr-F
658
+ type: chrf
659
+ value: 0.49517
660
+ - task:
661
+ name: Translation deu-heb
662
+ type: translation
663
+ args: deu-heb
664
+ dataset:
665
+ name: tatoeba-test-v2021-08-07
666
+ type: tatoeba_mt
667
+ args: deu-heb
668
+ metrics:
669
+ - name: BLEU
670
+ type: bleu
671
+ value: 35.8
672
+ - name: chr-F
673
+ type: chrf
674
+ value: 0.56943
675
+ - task:
676
+ name: Translation eng-heb
677
+ type: translation
678
+ args: eng-heb
679
+ dataset:
680
+ name: tatoeba-test-v2021-08-07
681
+ type: tatoeba_mt
682
+ args: eng-heb
683
+ metrics:
684
+ - name: BLEU
685
+ type: bleu
686
+ value: 34.9
687
+ - name: chr-F
688
+ type: chrf
689
+ value: 0.57708
690
+ - task:
691
+ name: Translation eng-mlt
692
+ type: translation
693
+ args: eng-mlt
694
+ dataset:
695
+ name: tatoeba-test-v2021-08-07
696
+ type: tatoeba_mt
697
+ args: eng-mlt
698
+ metrics:
699
+ - name: BLEU
700
+ type: bleu
701
+ value: 29.5
702
+ - name: chr-F
703
+ type: chrf
704
+ value: 0.61044
705
+ - task:
706
+ name: Translation fra-heb
707
+ type: translation
708
+ args: fra-heb
709
+ dataset:
710
+ name: tatoeba-test-v2021-08-07
711
+ type: tatoeba_mt
712
+ args: fra-heb
713
+ metrics:
714
+ - name: BLEU
715
+ type: bleu
716
+ value: 37.5
717
+ - name: chr-F
718
+ type: chrf
719
+ value: 0.58681
720
+ - task:
721
+ name: Translation por-heb
722
+ type: translation
723
+ args: por-heb
724
+ dataset:
725
+ name: tatoeba-test-v2021-08-07
726
+ type: tatoeba_mt
727
+ args: por-heb
728
+ metrics:
729
+ - name: BLEU
730
+ type: bleu
731
+ value: 41.0
732
+ - name: chr-F
733
+ type: chrf
734
+ value: 0.61593
735
+ - task:
736
+ name: Translation spa-ara
737
+ type: translation
738
+ args: spa-ara
739
+ dataset:
740
+ name: tatoeba-test-v2021-08-07
741
+ type: tatoeba_mt
742
+ args: spa-ara
743
+ metrics:
744
+ - name: BLEU
745
+ type: bleu
746
+ value: 23.9
747
+ - name: chr-F
748
+ type: chrf
749
+ value: 0.53669
750
+ - task:
751
+ name: Translation spa-heb
752
+ type: translation
753
+ args: spa-heb
754
+ dataset:
755
+ name: tatoeba-test-v2021-08-07
756
+ type: tatoeba_mt
757
+ args: spa-heb
758
+ metrics:
759
+ - name: BLEU
760
+ type: bleu
761
+ value: 41.2
762
+ - name: chr-F
763
+ type: chrf
764
+ value: 0.61966
765
+ - task:
766
+ name: Translation eng-ara
767
+ type: translation
768
+ args: eng-ara
769
+ dataset:
770
+ name: tico19-test
771
+ type: tico19-test
772
+ args: eng-ara
773
+ metrics:
774
+ - name: BLEU
775
+ type: bleu
776
+ value: 25.4
777
+ - name: chr-F
778
+ type: chrf
779
+ value: 0.56288
780
+ - task:
781
+ name: Translation eng-hau
782
+ type: translation
783
+ args: eng-hau
784
+ dataset:
785
+ name: tico19-test
786
+ type: tico19-test
787
+ args: eng-hau
788
+ metrics:
789
+ - name: BLEU
790
+ type: bleu
791
+ value: 22.2
792
+ - name: chr-F
793
+ type: chrf
794
+ value: 0.50060
795
+ - task:
796
+ name: Translation fra-ara
797
+ type: translation
798
+ args: fra-ara
799
+ dataset:
800
+ name: tico19-test
801
+ type: tico19-test
802
+ args: fra-ara
803
+ metrics:
804
+ - name: BLEU
805
+ type: bleu
806
+ value: 13.8
807
+ - name: chr-F
808
+ type: chrf
809
+ value: 0.39785
810
+ - task:
811
+ name: Translation por-ara
812
+ type: translation
813
+ args: por-ara
814
+ dataset:
815
+ name: tico19-test
816
+ type: tico19-test
817
+ args: por-ara
818
+ metrics:
819
+ - name: BLEU
820
+ type: bleu
821
+ value: 16.0
822
+ - name: chr-F
823
+ type: chrf
824
+ value: 0.44442
825
+ - task:
826
+ name: Translation spa-ara
827
+ type: translation
828
+ args: spa-ara
829
+ dataset:
830
+ name: tico19-test
831
+ type: tico19-test
832
+ args: spa-ara
833
+ metrics:
834
+ - name: BLEU
835
+ type: bleu
836
+ value: 16.5
837
+ - name: chr-F
838
+ type: chrf
839
+ value: 0.45429
840
+ - task:
841
+ name: Translation eng-hau
842
+ type: translation
843
+ args: eng-hau
844
+ dataset:
845
+ name: newstest2021
846
+ type: wmt-2021-news
847
+ args: eng-hau
848
+ metrics:
849
+ - name: BLEU
850
+ type: bleu
851
+ value: 13.1
852
+ - name: chr-F
853
+ type: chrf
854
+ value: 0.43617
855
+ ---
856
+ # opus-mt-tc-bible-big-deu_eng_fra_por_spa-afa
857
+
858
+ ## Table of Contents
859
+ - [Model Details](#model-details)
860
+ - [Uses](#uses)
861
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
862
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
863
+ - [Training](#training)
864
+ - [Evaluation](#evaluation)
865
+ - [Citation Information](#citation-information)
866
+ - [Acknowledgements](#acknowledgements)
867
+
868
+ ## Model Details
869
+
870
+ Neural machine translation model for translating from unknown (deu+eng+fra+por+spa) to Afro-Asiatic languages (afa).
871
+
872
+ This model is part of the [OPUS-MT project](https://github.com/Helsinki-NLP/Opus-MT), an effort to make neural machine translation models widely available and accessible for many languages in the world. All models are originally trained using the amazing framework of [Marian NMT](https://marian-nmt.github.io/), an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is taken from [OPUS](https://opus.nlpl.eu/) and training pipelines use the procedures of [OPUS-MT-train](https://github.com/Helsinki-NLP/Opus-MT-train).
873
+ **Model Description:**
874
+ - **Developed by:** Language Technology Research Group at the University of Helsinki
875
+ - **Model Type:** Translation (transformer-big)
876
+ - **Release**: 2024-05-29
877
+ - **License:** Apache-2.0
878
+ - **Language(s):**
879
+ - Source Language(s): deu eng fra por spa
880
+ - Target Language(s): aar acm afb amh apc ara arc arq arz bcw byn cop daa dsh gde gnd hau hbo heb hig irk jpa kab ker kqp ktb kxc lln lme meq mfh mfi mfk mif mlt mpg mqb muy oar orm pbi phn rif sgw shi shy som sur syc syr taq thv tig tir tmc tmh tmr ttr tzm wal xed zgh
881
+ - Valid Target Language Labels: >>aal<< >>aar<< >>aas<< >>acm<< >>afb<< >>agj<< >>ahg<< >>aij<< >>aiw<< >>ajw<< >>akk<< >>alw<< >>amh<< >>amw<< >>anc<< >>ank<< >>apc<< >>ara<< >>arc<< >>arq<< >>arv<< >>arz<< >>auj<< >>auo<< >>awn<< >>bbt<< >>bcq<< >>bcw<< >>bcy<< >>bde<< >>bdm<< >>bdn<< >>bds<< >>bej<< >>bhm<< >>bhn<< >>bhs<< >>bid<< >>bjf<< >>bji<< >>bnl<< >>bob<< >>bol<< >>bsw<< >>bta<< >>btf<< >>bux<< >>bva<< >>bvf<< >>bvh<< >>bvw<< >>bwo<< >>bwr<< >>bxe<< >>bxq<< >>byn<< >>cie<< >>ckl<< >>ckq<< >>cky<< >>cla<< >>cnu<< >>cop<< >>cop_Copt<< >>cuv<< >>daa<< >>dal<< >>dbb<< >>dbp<< >>dbq<< >>dbr<< >>dgh<< >>dim<< >>dkx<< >>dlk<< >>dme<< >>dot<< >>dox<< >>doz<< >>drs<< >>dsh<< >>dwa<< >>egy<< >>elo<< >>fie<< >>fkk<< >>fli<< >>gab<< >>gde<< >>gdf<< >>gdk<< >>gdl<< >>gdq<< >>gdu<< >>gea<< >>gek<< >>gew<< >>gex<< >>gez<< >>gft<< >>gha<< >>gho<< >>gid<< >>gis<< >>giz<< >>gji<< >>glo<< >>glw<< >>gnc<< >>gnd<< >>gou<< >>gow<< >>gqa<< >>grd<< >>grr<< >>gru<< >>gwd<< >>gwn<< >>har<< >>hau<< >>hau_Latn<< >>hbb<< >>hbo<< >>hbo_Hebr<< >>hdy<< >>heb<< >>hed<< >>hia<< >>hig<< >>hna<< >>hod<< >>hoh<< >>hrt<< >>hss<< >>huy<< >>hwo<< >>hya<< >>inm<< >>ior<< >>irk<< >>jaf<< >>jbe<< >>jbn<< >>jeu<< >>jia<< >>jie<< >>jii<< >>jim<< >>jmb<< >>jmi<< >>jnj<< >>jpa<< >>jpa_Hebr<< >>jrb<< >>juu<< >>kab<< >>kai<< >>kbz<< >>kcn<< >>kcs<< >>ker<< >>kil<< >>kkr<< >>kks<< >>kna<< >>kof<< >>kot<< >>kpa<< >>kqd<< >>kqp<< >>kqx<< >>ksq<< >>ktb<< >>ktc<< >>kuh<< >>kul<< >>kvf<< >>kvi<< >>kvj<< >>kwl<< >>kxc<< >>ldd<< >>lhs<< >>liq<< >>lln<< >>lme<< >>lsd<< >>maf<< >>mcn<< >>mcw<< >>mdx<< >>meq<< >>mes<< >>mew<< >>mey<< >>mfh<< >>mfi<< >>mfj<< >>mfk<< >>mfl<< >>mfm<< >>mid<< >>mif<< >>mje<< >>mjs<< >>mkf<< >>mlj<< >>mlr<< >>mlt<< >>mlw<< >>mmf<< >>mmy<< >>mou<< >>moz<< >>mpg<< >>mpi<< >>mpk<< >>mqb<< >>mrt<< >>mse<< >>msv<< >>mtl<< >>mub<< >>mug<< >>muj<< >>muu<< >>muy<< >>mvh<< >>mvz<< >>mxf<< >>mxu<< >>mys<< >>myz<< >>mzb<< >>nbh<< >>ndm<< >>ngi<< >>ngs<< >>ngw<< >>ngx<< >>nja<< >>nmi<< >>nnc<< >>nnn<< >>noz<< >>nxm<< >>oar<< >>oar_Hebr<< >>oar_Syrc<< >>orm<< >>oua<< >>pbi<< >>pcw<< >>phn<< >>phn_Phnx<< >>pip<< >>piy<< >>plj<< >>pqa<< >>rel<< >>rif<< >>rif_Latn<< >>rzh<< >>saa<< >>sam<< >>say<< >>scw<< >>sds<< >>sgw<< >>she<< >>shi<< >>shi_Latn<< >>shv<< >>shy<< >>shy_Latn<< >>sid<< >>sir<< >>siz<< >>sjs<< >>smp<< >>sok<< >>som<< >>sor<< >>sqr<< >>sqt<< >>ssn<< >>ssy<< >>stv<< >>sur<< >>swn<< >>swq<< >>swy<< >>syc<< >>syk<< >>syn<< >>syr<< >>tak<< >>tal<< >>tan<< >>taq<< >>tax<< >>tdk<< >>tez<< >>tgd<< >>thv<< >>tia<< >>tig<< >>tir<< >>tjo<< >>tmc<< >>tmh<< >>tmr<< >>tmr_Hebr<< >>tng<< >>tqq<< >>trg<< >>trj<< >>tru<< >>tsb<< >>tsh<< >>ttr<< >>twc<< >>tzm<< >>tzm_Latn<< >>tzm_Tfng<< >>ubi<< >>udl<< >>uga<< >>vem<< >>wal<< >>wbj<< >>wji<< >>wka<< >>wle<< >>xaa<< >>xan<< >>xeb<< >>xed<< >>xhd<< >>xmd<< >>xmj<< >>xna<< >>xpu<< >>xqt<< >>xsa<< >>ymm<< >>zah<< >>zay<< >>zaz<< >>zen<< >>zgh<< >>zim<< >>ziz<< >>zns<< >>zrn<< >>zua<< >>zuy<< >>zwa<<
882
+ - **Original Model**: [opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/deu+eng+fra+por+spa-afa/opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.zip)
883
+ - **Resources for more information:**
884
+ - [OPUS-MT dashboard](https://opus.nlpl.eu/dashboard/index.php?pkg=opusmt&test=all&scoreslang=all&chart=standard&model=Tatoeba-MT-models/deu%2Beng%2Bfra%2Bpor%2Bspa-afa/opusTCv20230926max50%2Bbt%2Bjhubc_transformer-big_2024-05-29)
885
+ - [OPUS-MT-train GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)
886
+ - [More information about MarianNMT models in the transformers library](https://huggingface.co/docs/transformers/model_doc/marian)
887
+ - [Tatoeba Translation Challenge](https://github.com/Helsinki-NLP/Tatoeba-Challenge/)
888
+ - [HPLT bilingual data v1 (as part of the Tatoeba Translation Challenge dataset)](https://hplt-project.org/datasets/v1)
889
+ - [A massively parallel Bible corpus](https://aclanthology.org/L14-1215/)
890
+
891
+ This is a multilingual translation model with multiple target languages. A sentence initial language token is required in the form of `>>id<<` (id = valid target language ID), e.g. `>>aar<<`
892
+
893
+ ## Uses
894
+
895
+ This model can be used for translation and text-to-text generation.
896
+
897
+ ## Risks, Limitations and Biases
898
+
899
+ **CONTENT WARNING: Readers should be aware that the model is trained on various public data sets that may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.**
900
+
901
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
902
+
903
+ ## How to Get Started With the Model
904
+
905
+ A short example code:
906
+
907
+ ```python
908
+ from transformers import MarianMTModel, MarianTokenizer
909
+
910
+ src_text = [
911
+ ">>kab<< Tu seras parmi nous demain.",
912
+ ">>heb<< Let's get out of here while we can."
913
+ ]
914
+
915
+ model_name = "pytorch-models/opus-mt-tc-bible-big-deu_eng_fra_por_spa-afa"
916
+ tokenizer = MarianTokenizer.from_pretrained(model_name)
917
+ model = MarianMTModel.from_pretrained(model_name)
918
+ translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
919
+
920
+ for t in translated:
921
+ print( tokenizer.decode(t, skip_special_tokens=True) )
922
+
923
+ # expected output:
924
+ # Azekka ad tiliḍ yid-i
925
+ # בוא נצא מכאן כל עוד אנחנו יכולים.
926
+ ```
927
+
928
+ You can also use OPUS-MT models with the transformers pipelines, for example:
929
+
930
+ ```python
931
+ from transformers import pipeline
932
+ pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-bible-big-deu_eng_fra_por_spa-afa")
933
+ print(pipe(">>kab<< Tu seras parmi nous demain."))
934
+
935
+ # expected output: Azekka ad tiliḍ yid-i
936
+ ```
937
+
938
+ ## Training
939
+
940
+ - **Data**: opusTCv20230926max50+bt+jhubc ([source](https://github.com/Helsinki-NLP/Tatoeba-Challenge))
941
+ - **Pre-processing**: SentencePiece (spm32k,spm32k)
942
+ - **Model Type:** transformer-big
943
+ - **Original MarianNMT Model**: [opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/deu+eng+fra+por+spa-afa/opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.zip)
944
+ - **Training Scripts**: [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)
945
+
946
+ ## Evaluation
947
+
948
+ * [Model scores at the OPUS-MT dashboard](https://opus.nlpl.eu/dashboard/index.php?pkg=opusmt&test=all&scoreslang=all&chart=standard&model=Tatoeba-MT-models/deu%2Beng%2Bfra%2Bpor%2Bspa-afa/opusTCv20230926max50%2Bbt%2Bjhubc_transformer-big_2024-05-29)
949
+ * test set translations: [opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/deu+eng+fra+por+spa-afa/opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.test.txt)
950
+ * test set scores: [opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/deu+eng+fra+por+spa-afa/opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29.eval.txt)
951
+ * benchmark results: [benchmark_results.txt](benchmark_results.txt)
952
+ * benchmark output: [benchmark_translations.zip](benchmark_translations.zip)
953
+
954
+ | langpair | testset | chr-F | BLEU | #sent | #words |
955
+ |----------|---------|-------|-------|-------|--------|
956
+ | deu-ara | tatoeba-test-v2021-08-07 | 0.49517 | 20.2 | 1209 | 6324 |
957
+ | deu-heb | tatoeba-test-v2021-08-07 | 0.56943 | 35.8 | 3090 | 20341 |
958
+ | eng-ara | tatoeba-test-v2021-08-07 | 0.46273 | 17.3 | 10305 | 61356 |
959
+ | eng-heb | tatoeba-test-v2021-08-07 | 0.57708 | 34.9 | 10519 | 63628 |
960
+ | eng-mlt | tatoeba-test-v2021-08-07 | 0.61044 | 29.5 | 203 | 899 |
961
+ | fra-ara | tatoeba-test-v2021-08-07 | 0.42223 | 10.4 | 1569 | 7956 |
962
+ | fra-heb | tatoeba-test-v2021-08-07 | 0.58681 | 37.5 | 3281 | 20655 |
963
+ | por-heb | tatoeba-test-v2021-08-07 | 0.61593 | 41.0 | 719 | 4423 |
964
+ | spa-ara | tatoeba-test-v2021-08-07 | 0.53669 | 23.9 | 1511 | 7547 |
965
+ | spa-heb | tatoeba-test-v2021-08-07 | 0.61966 | 41.2 | 1849 | 12112 |
966
+ | deu-ara | flores101-devtest | 0.47927 | 15.7 | 1012 | 21357 |
967
+ | eng-hau | flores101-devtest | 0.47807 | 19.0 | 1012 | 27730 |
968
+ | eng-mlt | flores101-devtest | 0.67196 | 32.9 | 1012 | 22169 |
969
+ | fra-mlt | flores101-devtest | 0.56271 | 19.9 | 1012 | 22169 |
970
+ | por-heb | flores101-devtest | 0.49378 | 19.6 | 1012 | 20749 |
971
+ | spa-ara | flores101-devtest | 0.44988 | 11.7 | 1012 | 21357 |
972
+ | deu-ara | flores200-devtest | 0.661 | 0.0 | 1012 | 5 |
973
+ | deu-hau | flores200-devtest | 0.40471 | 11.4 | 1012 | 27730 |
974
+ | deu-heb | flores200-devtest | 0.48645 | 18.1 | 1012 | 20238 |
975
+ | deu-mlt | flores200-devtest | 0.54079 | 17.5 | 1012 | 22169 |
976
+ | eng-ara | flores200-devtest | 0.627 | 0.0 | 1012 | 5 |
977
+ | eng-arz | flores200-devtest | 0.42804 | 11.1 | 1012 | 21034 |
978
+ | eng-hau | flores200-devtest | 0.49023 | 20.4 | 1012 | 27730 |
979
+ | eng-heb | flores200-devtest | 0.56635 | 27.1 | 1012 | 20238 |
980
+ | eng-mlt | flores200-devtest | 0.68334 | 34.9 | 1012 | 22169 |
981
+ | eng-som | flores200-devtest | 0.42814 | 9.9 | 1012 | 25991 |
982
+ | fra-ara | flores200-devtest | 0.631 | 0.0 | 1012 | 5 |
983
+ | fra-hau | flores200-devtest | 0.42731 | 13.2 | 1012 | 27730 |
984
+ | fra-heb | flores200-devtest | 0.49683 | 19.1 | 1012 | 20238 |
985
+ | fra-mlt | flores200-devtest | 0.56844 | 20.4 | 1012 | 22169 |
986
+ | por-ara | flores200-devtest | 0.622 | 0.0 | 1012 | 5 |
987
+ | por-hau | flores200-devtest | 0.42593 | 13.6 | 1012 | 27730 |
988
+ | por-heb | flores200-devtest | 0.50345 | 19.7 | 1012 | 20238 |
989
+ | por-mlt | flores200-devtest | 0.58913 | 21.5 | 1012 | 22169 |
990
+ | spa-ara | flores200-devtest | 0.587 | 0.0 | 1012 | 5 |
991
+ | spa-hau | flores200-devtest | 0.40309 | 9.4 | 1012 | 27730 |
992
+ | spa-heb | flores200-devtest | 0.45249 | 13.5 | 1012 | 20238 |
993
+ | spa-mlt | flores200-devtest | 0.51077 | 12.7 | 1012 | 22169 |
994
+ | eng-hau | newstest2021 | 0.43617 | 13.1 | 1000 | 32966 |
995
+ | deu-hau | ntrex128 | 0.41931 | 12.5 | 1997 | 54982 |
996
+ | deu-heb | ntrex128 | 0.43961 | 13.3 | 1997 | 39624 |
997
+ | deu-mlt | ntrex128 | 0.49871 | 15.1 | 1997 | 43308 |
998
+ | eng-hau | ntrex128 | 0.51601 | 23.2 | 1997 | 54982 |
999
+ | eng-heb | ntrex128 | 0.50625 | 20.3 | 1997 | 39624 |
1000
+ | eng-mlt | ntrex128 | 0.62552 | 29.0 | 1997 | 43308 |
1001
+ | eng-som | ntrex128 | 0.46845 | 13.5 | 1997 | 49351 |
1002
+ | fra-hau | ntrex128 | 0.43729 | 14.5 | 1997 | 54982 |
1003
+ | fra-heb | ntrex128 | 0.43855 | 13.9 | 1997 | 39624 |
1004
+ | fra-mlt | ntrex128 | 0.51640 | 17.3 | 1997 | 43308 |
1005
+ | fra-som | ntrex128 | 0.41813 | 9.6 | 1997 | 49351 |
1006
+ | por-hau | ntrex128 | 0.44408 | 15.1 | 1997 | 54982 |
1007
+ | por-heb | ntrex128 | 0.45739 | 15.0 | 1997 | 39624 |
1008
+ | por-mlt | ntrex128 | 0.53719 | 18.2 | 1997 | 43308 |
1009
+ | por-som | ntrex128 | 0.41367 | 9.3 | 1997 | 49351 |
1010
+ | spa-hau | ntrex128 | 0.44695 | 14.8 | 1997 | 54982 |
1011
+ | spa-heb | ntrex128 | 0.45509 | 14.5 | 1997 | 39624 |
1012
+ | spa-mlt | ntrex128 | 0.53631 | 17.7 | 1997 | 43308 |
1013
+ | spa-som | ntrex128 | 0.41755 | 9.1 | 1997 | 49351 |
1014
+ | eng-ara | tico19-test | 0.56288 | 25.4 | 2100 | 51339 |
1015
+ | eng-hau | tico19-test | 0.50060 | 22.2 | 2100 | 64509 |
1016
+ | fra-amh | tico19-test | 3.575 | 1.3 | 2100 | 44782 |
1017
+ | fra-hau | tico19-test | 5.071 | 1.8 | 2100 | 64509 |
1018
+ | fra-orm | tico19-test | 4.044 | 1.8 | 2100 | 50032 |
1019
+ | fra-som | tico19-test | 2.698 | 0.9 | 2100 | 63654 |
1020
+ | fra-tir | tico19-test | 4.151 | 1.4 | 2100 | 46685 |
1021
+ | por-amh | tico19-test | 3.799 | 1.4 | 2100 | 44782 |
1022
+ | por-ara | tico19-test | 0.44442 | 16.0 | 2100 | 51339 |
1023
+ | por-hau | tico19-test | 5.786 | 2.0 | 2100 | 64509 |
1024
+ | por-orm | tico19-test | 4.613 | 2.0 | 2100 | 50032 |
1025
+ | por-som | tico19-test | 3.413 | 1.2 | 2100 | 63654 |
1026
+ | por-tir | tico19-test | 5.092 | 1.6 | 2100 | 46685 |
1027
+ | spa-amh | tico19-test | 3.831 | 1.4 | 2100 | 44782 |
1028
+ | spa-ara | tico19-test | 0.45429 | 16.5 | 2100 | 51339 |
1029
+ | spa-hau | tico19-test | 5.790 | 1.9 | 2100 | 64509 |
1030
+ | spa-orm | tico19-test | 4.617 | 1.9 | 2100 | 50032 |
1031
+ | spa-som | tico19-test | 3.402 | 1.2 | 2100 | 63654 |
1032
+ | spa-tir | tico19-test | 5.033 | 1.6 | 2100 | 46685 |
1033
+
1034
+ ## Citation Information
1035
+
1036
+ * Publications: [Democratizing neural machine translation with OPUS-MT](https://doi.org/10.1007/s10579-023-09704-w) and [OPUS-MT – Building open translation services for the World](https://aclanthology.org/2020.eamt-1.61/) and [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) (Please, cite if you use this model.)
1037
+
1038
+ ```bibtex
1039
+ @article{tiedemann2023democratizing,
1040
+ title={Democratizing neural machine translation with {OPUS-MT}},
1041
+ author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
1042
+ journal={Language Resources and Evaluation},
1043
+ number={58},
1044
+ pages={713--755},
1045
+ year={2023},
1046
+ publisher={Springer Nature},
1047
+ issn={1574-0218},
1048
+ doi={10.1007/s10579-023-09704-w}
1049
+ }
1050
+
1051
+ @inproceedings{tiedemann-thottingal-2020-opus,
1052
+ title = "{OPUS}-{MT} {--} Building open translation services for the World",
1053
+ author = {Tiedemann, J{\"o}rg and Thottingal, Santhosh},
1054
+ booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
1055
+ month = nov,
1056
+ year = "2020",
1057
+ address = "Lisboa, Portugal",
1058
+ publisher = "European Association for Machine Translation",
1059
+ url = "https://aclanthology.org/2020.eamt-1.61",
1060
+ pages = "479--480",
1061
+ }
1062
+
1063
+ @inproceedings{tiedemann-2020-tatoeba,
1064
+ title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
1065
+ author = {Tiedemann, J{\"o}rg},
1066
+ booktitle = "Proceedings of the Fifth Conference on Machine Translation",
1067
+ month = nov,
1068
+ year = "2020",
1069
+ address = "Online",
1070
+ publisher = "Association for Computational Linguistics",
1071
+ url = "https://aclanthology.org/2020.wmt-1.139",
1072
+ pages = "1174--1182",
1073
+ }
1074
+ ```
1075
+
1076
+ ## Acknowledgements
1077
+
1078
+ The work is supported by the [HPLT project](https://hplt-project.org/), funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350. We are also grateful for the generous computational resources and IT infrastructure provided by [CSC -- IT Center for Science](https://www.csc.fi/), Finland, and the [EuroHPC supercomputer LUMI](https://www.lumi-supercomputer.eu/).
1079
+
1080
+ ## Model conversion info
1081
+
1082
+ * transformers version: 4.45.1
1083
+ * OPUS-MT git hash: 0882077
1084
+ * port time: Tue Oct 8 00:25:35 EEST 2024
1085
+ * port machine: LM0-400-22516.local
benchmark_results.txt ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ multi-multi tatoeba-test-v2020-07-28-v2023-09-26 0.38446 15.7 10000 60769
2
+ deu-amh flores101-devtest 0.24177 3.2 1012 17752
3
+ deu-ara flores101-devtest 0.47927 15.7 1012 21357
4
+ deu-hau flores101-devtest 0.39583 10.6 1012 27730
5
+ deu-orm flores101-devtest 0.27616 1.6 1012 22305
6
+ deu-som flores101-devtest 0.36012 5.6 1012 25991
7
+ eng-amh flores101-devtest 0.30626 5.7 1012 17752
8
+ eng-hau flores101-devtest 0.47807 19.0 1012 27730
9
+ eng-mlt flores101-devtest 0.67196 32.9 1012 22169
10
+ eng-orm flores101-devtest 0.27540 1.9 1012 22305
11
+ fra-amh flores101-devtest 0.23655 2.7 1012 17752
12
+ fra-mlt flores101-devtest 0.56271 19.9 1012 22169
13
+ por-heb flores101-devtest 0.49378 19.6 1012 20749
14
+ spa-ara flores101-devtest 0.44988 11.7 1012 21357
15
+ deu-acm flores200-devtest 0.15386 1.8 1012 20497
16
+ deu-amh flores200-devtest 0.25718 3.7 1012 17752
17
+ deu-apc flores200-devtest 0.28133 3.1 1012 19476
18
+ deu-ara flores200-devtest 0.661 0.0 1012 5
19
+ deu-arz flores200-devtest 0.36318 7.3 1012 21034
20
+ deu-hau flores200-devtest 0.40471 11.4 1012 27730
21
+ deu-heb flores200-devtest 0.48645 18.1 1012 20238
22
+ deu-kab flores200-devtest 0.26058 4.0 1012 24833
23
+ deu-mlt flores200-devtest 0.54079 17.5 1012 22169
24
+ deu-som flores200-devtest 0.36953 6.0 1012 25991
25
+ deu-tir flores200-devtest 0.12053 0.5 1012 19825
26
+ eng-acm flores200-devtest 0.25847 5.7 1012 20497
27
+ eng-amh flores200-devtest 0.32177 6.1 1012 17752
28
+ eng-apc flores200-devtest 0.29364 4.6 1012 19476
29
+ eng-ara flores200-devtest 0.627 0.0 1012 5
30
+ eng-arz flores200-devtest 0.42804 11.1 1012 21034
31
+ eng-hau flores200-devtest 0.49023 20.4 1012 27730
32
+ eng-heb flores200-devtest 0.56635 27.1 1012 20238
33
+ eng-kab flores200-devtest 0.24787 4.6 1012 24833
34
+ eng-mlt flores200-devtest 0.68334 34.9 1012 22169
35
+ eng-som flores200-devtest 0.42814 9.9 1012 25991
36
+ eng-tir flores200-devtest 0.15638 1.0 1012 19825
37
+ fra-acm flores200-devtest 0.18465 2.8 1012 20497
38
+ fra-amh flores200-devtest 0.25459 3.2 1012 17752
39
+ fra-apc flores200-devtest 0.26330 2.3 1012 19476
40
+ fra-ara flores200-devtest 0.631 0.0 1012 5
41
+ fra-arz flores200-devtest 0.37400 7.7 1012 21034
42
+ fra-hau flores200-devtest 0.42731 13.2 1012 27730
43
+ fra-heb flores200-devtest 0.49683 19.1 1012 20238
44
+ fra-kab flores200-devtest 0.25304 3.9 1012 24833
45
+ fra-mlt flores200-devtest 0.56844 20.4 1012 22169
46
+ fra-som flores200-devtest 0.39543 7.3 1012 25991
47
+ fra-tir flores200-devtest 0.12494 0.4 1012 19825
48
+ por-acm flores200-devtest 0.16154 2.0 1012 20497
49
+ por-amh flores200-devtest 0.25501 3.4 1012 17752
50
+ por-apc flores200-devtest 0.29378 3.2 1012 19476
51
+ por-ara flores200-devtest 0.622 0.0 1012 5
52
+ por-arz flores200-devtest 0.36797 7.0 1012 21034
53
+ por-hau flores200-devtest 0.42593 13.6 1012 27730
54
+ por-heb flores200-devtest 0.50345 19.7 1012 20238
55
+ por-kab flores200-devtest 0.27366 4.7 1012 24833
56
+ por-mlt flores200-devtest 0.58913 21.5 1012 22169
57
+ por-som flores200-devtest 0.38536 7.1 1012 25991
58
+ por-tir flores200-devtest 0.11874 0.5 1012 19825
59
+ spa-acm flores200-devtest 0.17764 2.3 1012 20497
60
+ spa-amh flores200-devtest 0.23018 2.5 1012 17752
61
+ spa-apc flores200-devtest 0.24763 1.7 1012 19476
62
+ spa-ara flores200-devtest 0.587 0.0 1012 5
63
+ spa-arz flores200-devtest 0.36220 6.3 1012 21034
64
+ spa-hau flores200-devtest 0.40309 9.4 1012 27730
65
+ spa-heb flores200-devtest 0.45249 13.5 1012 20238
66
+ spa-kab flores200-devtest 0.26532 3.9 1012 24833
67
+ spa-mlt flores200-devtest 0.51077 12.7 1012 22169
68
+ spa-som flores200-devtest 0.37323 5.3 1012 25991
69
+ spa-tir flores200-devtest 0.11476 0.4 1012 19825
70
+ eng-hau newstest2021 0.43617 13.1 1000 32966
71
+ deu-amh ntrex128 0.18069 1.2 1997 33546
72
+ deu-hau ntrex128 0.41931 12.5 1997 54982
73
+ deu-heb ntrex128 0.43961 13.3 1997 39624
74
+ deu-mlt ntrex128 0.49871 15.1 1997 43308
75
+ deu-orm ntrex128 0.29153 1.4 1997 35048
76
+ deu-shi ntrex128 0.281 0.1 1997 42236
77
+ deu-som ntrex128 0.39641 8.3 1997 49351
78
+ deu-tir ntrex128 0.11954 0.5 1997 36935
79
+ eng-amh ntrex128 0.21006 1.9 1997 33546
80
+ eng-hau ntrex128 0.51601 23.2 1997 54982
81
+ eng-heb ntrex128 0.50625 20.3 1997 39624
82
+ eng-mlt ntrex128 0.62552 29.0 1997 43308
83
+ eng-orm ntrex128 0.28469 1.8 1997 35048
84
+ eng-shi ntrex128 0.371 0.1 1997 42236
85
+ eng-som ntrex128 0.46845 13.5 1997 49351
86
+ eng-tir ntrex128 0.14108 0.9 1997 36935
87
+ fra-amh ntrex128 0.17681 1.0 1997 33546
88
+ fra-hau ntrex128 0.43729 14.5 1997 54982
89
+ fra-heb ntrex128 0.43855 13.9 1997 39624
90
+ fra-mlt ntrex128 0.51640 17.3 1997 43308
91
+ fra-orm ntrex128 0.29123 1.4 1997 35048
92
+ fra-shi ntrex128 0.259 0.1 1997 42236
93
+ fra-som ntrex128 0.41813 9.6 1997 49351
94
+ fra-tir ntrex128 0.11951 0.4 1997 36935
95
+ por-amh ntrex128 0.17823 1.1 1997 33546
96
+ por-hau ntrex128 0.44408 15.1 1997 54982
97
+ por-heb ntrex128 0.45739 15.0 1997 39624
98
+ por-mlt ntrex128 0.53719 18.2 1997 43308
99
+ por-orm ntrex128 0.28921 1.6 1997 35048
100
+ por-shi ntrex128 0.268 0.1 1997 42236
101
+ por-som ntrex128 0.41367 9.3 1997 49351
102
+ por-tir ntrex128 0.11696 0.4 1997 36935
103
+ spa-amh ntrex128 0.17987 1.1 1997 33546
104
+ spa-hau ntrex128 0.44695 14.8 1997 54982
105
+ spa-heb ntrex128 0.45509 14.5 1997 39624
106
+ spa-mlt ntrex128 0.53631 17.7 1997 43308
107
+ spa-orm ntrex128 0.29343 1.6 1997 35048
108
+ spa-shi ntrex128 0.270 0.0 1997 42236
109
+ spa-som ntrex128 0.41755 9.1 1997 49351
110
+ spa-tir ntrex128 0.11852 0.5 1997 36935
111
+ eng-ara tatoeba-test-v2020-07-28 0.45711 16.7 10000 58935
112
+ eng-arq tatoeba-test-v2020-07-28 0.11733 0.7 403 2272
113
+ eng-kab tatoeba-test-v2020-07-28 0.31979 9.3 10000 54472
114
+ fra-kab tatoeba-test-v2020-07-28 0.28102 6.1 10000 64305
115
+ eng-amh tatoeba-test-v2021-03-30 0.49054 14.0 202 615
116
+ eng-ara tatoeba-test-v2021-03-30 0.45593 16.5 10267 61124
117
+ eng-arq tatoeba-test-v2021-03-30 0.11749 0.7 405 2285
118
+ eng-heb tatoeba-test-v2021-03-30 0.56881 34.0 10366 62601
119
+ eng-mlt tatoeba-test-v2021-03-30 0.59605 27.7 206 911
120
+ deu-ara tatoeba-test-v2021-08-07 0.49517 20.2 1209 6324
121
+ deu-heb tatoeba-test-v2021-08-07 0.56943 35.8 3090 20341
122
+ deu-kab tatoeba-test-v2021-08-07 0.28165 7.5 373 2077
123
+ eng-ara tatoeba-test-v2021-08-07 0.46273 17.3 10305 61356
124
+ eng-arq tatoeba-test-v2021-08-07 0.11662 0.9 405 2285
125
+ eng-heb tatoeba-test-v2021-08-07 0.57708 34.9 10519 63628
126
+ eng-kab tatoeba-test-v2021-08-07 0.31813 8.6 12142 69666
127
+ eng-mlt tatoeba-test-v2021-08-07 0.61044 29.5 203 899
128
+ fra-ara tatoeba-test-v2021-08-07 0.42223 10.4 1569 7956
129
+ fra-heb tatoeba-test-v2021-08-07 0.58681 37.5 3281 20655
130
+ fra-kab tatoeba-test-v2021-08-07 0.28352 6.2 12491 81508
131
+ por-heb tatoeba-test-v2021-08-07 0.61593 41.0 719 4423
132
+ spa-ara tatoeba-test-v2021-08-07 0.53669 23.9 1511 7547
133
+ spa-heb tatoeba-test-v2021-08-07 0.61966 41.2 1849 12112
134
+ spa-kab tatoeba-test-v2021-08-07 0.30333 8.6 883 5828
135
+ eng-amh tico19-test 0.28647 5.9 2100 44782
136
+ eng-ara tico19-test 0.56288 25.4 2100 51339
137
+ eng-hau tico19-test 0.50060 22.2 2100 64509
138
+ eng-orm tico19-test 0.33025 5.2 2100 50032
139
+ eng-som tico19-test 0.32726 7.8 2100 63654
140
+ eng-tir tico19-test 0.16570 1.7 2100 46685
141
+ fra-amh tico19-test 3.575 1.3 2100 44782
142
+ fra-ara tico19-test 0.39785 13.8 2100 51339
143
+ fra-hau tico19-test 5.071 1.8 2100 64509
144
+ fra-orm tico19-test 4.044 1.8 2100 50032
145
+ fra-som tico19-test 2.698 0.9 2100 63654
146
+ fra-tir tico19-test 4.151 1.4 2100 46685
147
+ por-amh tico19-test 3.799 1.4 2100 44782
148
+ por-ara tico19-test 0.44442 16.0 2100 51339
149
+ por-hau tico19-test 5.786 2.0 2100 64509
150
+ por-orm tico19-test 4.613 2.0 2100 50032
151
+ por-som tico19-test 3.413 1.2 2100 63654
152
+ por-tir tico19-test 5.092 1.6 2100 46685
153
+ spa-amh tico19-test 3.831 1.4 2100 44782
154
+ spa-ara tico19-test 0.45429 16.5 2100 51339
155
+ spa-hau tico19-test 5.790 1.9 2100 64509
156
+ spa-orm tico19-test 4.617 1.9 2100 50032
157
+ spa-som tico19-test 3.402 1.2 2100 63654
158
+ spa-tir tico19-test 5.033 1.6 2100 46685
benchmark_translations.zip ADDED
File without changes
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "pytorch-models/opus-mt-tc-bible-big-deu_eng_fra_por_spa-afa",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "relu",
5
+ "architectures": [
6
+ "MarianMTModel"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "bos_token_id": 0,
10
+ "classifier_dropout": 0.0,
11
+ "d_model": 1024,
12
+ "decoder_attention_heads": 16,
13
+ "decoder_ffn_dim": 4096,
14
+ "decoder_layerdrop": 0.0,
15
+ "decoder_layers": 6,
16
+ "decoder_start_token_id": 61814,
17
+ "decoder_vocab_size": 61815,
18
+ "dropout": 0.1,
19
+ "encoder_attention_heads": 16,
20
+ "encoder_ffn_dim": 4096,
21
+ "encoder_layerdrop": 0.0,
22
+ "encoder_layers": 6,
23
+ "eos_token_id": 407,
24
+ "forced_eos_token_id": null,
25
+ "init_std": 0.02,
26
+ "is_encoder_decoder": true,
27
+ "max_length": null,
28
+ "max_position_embeddings": 1024,
29
+ "model_type": "marian",
30
+ "normalize_embedding": false,
31
+ "num_beams": null,
32
+ "num_hidden_layers": 6,
33
+ "pad_token_id": 61814,
34
+ "scale_embedding": true,
35
+ "share_encoder_decoder_embeddings": true,
36
+ "static_position_embeddings": true,
37
+ "torch_dtype": "float32",
38
+ "transformers_version": "4.45.1",
39
+ "use_cache": true,
40
+ "vocab_size": 61815
41
+ }
generation_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bad_words_ids": [
4
+ [
5
+ 61814
6
+ ]
7
+ ],
8
+ "bos_token_id": 0,
9
+ "decoder_start_token_id": 61814,
10
+ "eos_token_id": 407,
11
+ "forced_eos_token_id": 407,
12
+ "max_length": 512,
13
+ "num_beams": 4,
14
+ "pad_token_id": 61814,
15
+ "transformers_version": "4.45.1"
16
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbd260e786117efac73c8c45244cfe9893c4218ab741a2c62a6e0a0efb7eb7f3
3
+ size 958900620
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0100e1967fb7ddc6919d7e5ca5ddcd0a560ec6fc0670d4ac4b49c88956a66cac
3
+ size 958951877
source.spm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec2750364c0b128a438565ce4aef7d04484a1df1faaeb819e82e721fbd5a0a86
3
+ size 812686
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}
target.spm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0a452908855cc42f55f19733296fea1245e75ab4d5f1c05cf166261282b4e36
3
+ size 806252
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"source_lang": "deu+eng+fra+por+spa", "target_lang": "afa", "unk_token": "<unk>", "eos_token": "</s>", "pad_token": "<pad>", "model_max_length": 512, "sp_model_kwargs": {}, "separate_vocabs": false, "special_tokens_map_file": null, "name_or_path": "marian-models/opusTCv20230926max50+bt+jhubc_transformer-big_2024-05-29/deu+eng+fra+por+spa-afa", "tokenizer_class": "MarianTokenizer"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff