|
--- |
|
license: mit |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- nl |
|
- es |
|
- ru |
|
- pt |
|
- ro |
|
- it |
|
metrics: |
|
- bleu |
|
pipeline_tag: translation |
|
--- |
|
# Model Name |
|
|
|
This is a multilingually fine-tuned version of [NLLB](https://arxiv.org/abs/2207.04672) based on [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) using the text data of [MuST-C v1.0](https://aclanthology.org/N19-1202/) (En -> 8). |
|
|
|
It is part of the paper [Pushing the Limits of Zero-shot End-to-end Speech Translation](https://arxiv.org/abs/2402.10422). Details for the fine-tuning process are available at Appendix D. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8") |
|
|
|
model.eval() |
|
model.to("cuda") |
|
|
|
text = "Translate this text to German." |
|
inputs = tokenizer(text, return_tensors="pt").to("cuda") |
|
outputs = model.generate( |
|
**inputs, |
|
num_beams=5, |
|
forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"] |
|
) |
|
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(translated_text) |
|
``` |
|
|
|
## Results |
|
|
|
#### BLEU scores on MuST-C v1.0 tst-COMMON |
|
|
|
| Model | De | Es | Fr | It | Nl | Pt | Ro | Ru | Average | |
|
|:-------------------------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:-------:| |
|
| [nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) (original) | 32.7 | 36.9 | 45.2 | 32.2 | 36.0 | 37.4 | 30.3 | 21.0 | 34.0 | |
|
| [nllb-200-distilled-600M_mustc_en-to-8](https://huggingface.co/johntsi/nllb-200-distilled-600M_mustc_en-to-8) | 34.4 | 38.8 | 44.6 | 34.7 | 39.0 | 41.6 | 32.1 | 22.4 | 35.9 | |
|
| [nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B) (original) | 34.6 | 38.6 | 46.8 | 33.7 | 38.2 | 39.6 | 31.8 | 23.2 | 35.8 | |
|
| [nllb-200-distilled-1.3B_mustc_en-to-8](https://huggingface.co/johntsi/nllb-200-distilled-1.3B_mustc_en-to-8) | 35.3 | 39.9 | 45.8 | 36.0 | 40.6 | 43.1 | 32.6 | 23.9 | 37.2 | |
|
|
|
## Citation |
|
|
|
If you find these models useful for your research, please cite our paper :) |
|
|
|
``` |
|
@inproceedings{tsiamas-etal-2024-pushing, |
|
title = {{Pushing the Limits of Zero-shot End-to-End Speech Translation}}, |
|
author = "Tsiamas, Ioannis and |
|
G{\'a}llego, Gerard and |
|
Fonollosa, Jos{\'e} and |
|
Costa-juss{\`a}, Marta", |
|
editor = "Ku, Lun-Wei and |
|
Martins, Andre and |
|
Srikumar, Vivek", |
|
booktitle = "Findings of the Association for Computational Linguistics ACL 2024", |
|
month = aug, |
|
year = "2024", |
|
address = "Bangkok, Thailand and virtual meeting", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.findings-acl.847", |
|
pages = "14245--14267", |
|
} |
|
``` |