|
--- |
|
base_model: facebook/nllb-200-1.3B |
|
model-index: |
|
- name: translate-nllb-1.3b-salt |
|
results: [] |
|
datasets: |
|
- Sunbird/salt |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
# Model details |
|
|
|
This machine translation model can convert single sentences from and to any combination of the following languages: |
|
|
|
| ISO 693-3 | Language name | |
|
| --- | --- | |
|
| eng | English | |
|
| ach | Acholi | |
|
| lgg | Lugbara | |
|
| lug | Luganda | |
|
| nyn | Runyankole | |
|
| teo | Ateso | |
|
|
|
It was trained on the [SALT](http://huggingface.co/datasets/Sunbird/salt) dataset and a variety of |
|
additional external data resources, including back-translated news articles, FLORES-200, MT560 and LAFAND-MT. |
|
The base model was [facebok/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B), |
|
with tokens adapted to add support for languages not originally included. |
|
|
|
# Usage example |
|
|
|
```python |
|
tokenizer = transformers.NllbTokenizer.from_pretrained( |
|
'Sunbird/translate-nllb-1.3b-salt') |
|
model = transformers.M2M100ForConditionalGeneration.from_pretrained( |
|
'Sunbird/translate-nllb-1.3b-salt') |
|
|
|
text = 'Where is the hospital?' |
|
source_language = 'eng' |
|
target_language = 'lug' |
|
|
|
language_tokens = { |
|
'eng': 256047, |
|
'ach': 256111, |
|
'lgg': 256008, |
|
'lug': 256110, |
|
'nyn': 256002, |
|
'teo': 256006, |
|
} |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
inputs = tokenizer(text, return_tensors="pt").to(device) |
|
inputs['input_ids'][0][0] = language_tokens[source_language] |
|
translated_tokens = model.to(device).generate( |
|
**inputs, |
|
forced_bos_token_id=language_tokens[target_language], |
|
max_length=100, |
|
num_beams=5, |
|
) |
|
|
|
result = tokenizer.batch_decode( |
|
translated_tokens, skip_special_tokens=True)[0] |
|
# Eddwaliro liri ludda wa? |
|
``` |
|
|
|
# Evaluation metrics |
|
|
|
Results on salt-dev: |
|
|
|
| Source language | Target language | BLEU | |
|
| --- | --- | --- | |
|
| ach | eng | 28.371 | |
|
| lgg | eng | 30.45 | |
|
| lug | eng | 41.978 | |
|
| nyn | eng |32.296 | |
|
| teo | eng | 30.422 | |
|
| eng | ach | 20.972 | |
|
| eng | lgg | 22.362 | |
|
| eng | lug | 30.359 | |
|
| eng | nyn | 15.305 | |
|
| eng | teo | 21.391 | |