File size: 2,917 Bytes
31de834
 
 
 
 
 
 
 
 
 
 
 
 
 
6dd7a53
31de834
6594c13
 
f78ce7e
 
 
6594c13
 
 
 
 
 
 
 
 
94f1dbe
 
 
 
 
 
 
 
 
 
6594c13
 
31de834
 
73c6834
 
 
31de834
918ebf9
6dd7a53
f78ce7e
 
 
73c6834
 
 
 
 
 
 
1c36892
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73c6834
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: mit
language:
- en
- de
- fr
- nl
- es
- ru
- pt
- ro
- it
metrics:
- bleu
pipeline_tag: translation
---
# Model Name

This is a multilingually fine-tuned version of [NLLB](https://arxiv.org/abs/2207.04672) based on [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) using the text data of [MuST-C v1.0](https://aclanthology.org/N19-1202/) (En -> 8).

It is part of the paper [Pushing the Limits of Zero-shot End-to-end Speech Translation](https://arxiv.org/abs/2402.10422). Details for the fine-tuning process are available at Appendix D.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")
model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")

model.eval()
model.to("cuda")

text = "Translate this text to German."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    num_beams=5,
    forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"]
)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)
```

## Results

#### BLEU scores on MuST-C v1.0 tst-COMMON

| Model                     | De   | Es   | Fr   | It   | Nl   | Pt   | Ro   | Ru   | Average |
|:-------------------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:-------:|
| [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) (original) | 32.7 | 36.9 | 45.2 | 32.2 | 36.0 | 37.4 | 30.3 | 21.0 | 34.0 |
| [nllb-200-distilled-600M_mustc_en-to-8](https://huggingface.co/johntsi/nllb-200-distilled-600M_mustc_en-to-8)      | 34.4 | 38.8 | 44.6 | 34.7 | 39.0 | 41.6 | 32.1 | 22.4 | 35.9 |
| [nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) (original)   | 34.6 | 38.6 | 46.8 | 33.7 | 38.2 | 39.6 | 31.8 | 23.2 | 35.8 |
| [nllb-200-distilled-1.3B_mustc_en-to-8](https://huggingface.co/johntsi/nllb-200-distilled-1.3B_mustc_en-to-8)        | 35.3 | 39.9 | 45.8 | 36.0 | 40.6 | 43.1 | 32.6 | 23.9 | 37.2 |

## Citation

If you find these models useful for your research, please cite our paper :)

```
@inproceedings{tsiamas-etal-2024-pushing,
    title = {{Pushing the Limits of Zero-shot End-to-End Speech Translation}},
    author = "Tsiamas, Ioannis  and
      G{\'a}llego, Gerard  and
      Fonollosa, Jos{\'e}  and
      Costa-juss{\`a}, Marta",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.847",
    pages = "14245--14267",
}
```