|
#### Languages: |
|
|
|
- Source language: English |
|
|
|
- Source language: isiZulu |
|
|
|
#### Model Details: |
|
|
|
- model: transformer |
|
|
|
- Architecture: MarianMT |
|
|
|
- pre-processing: normalization + SentencePiece |
|
|
|
#### Pre-trained Model: |
|
|
|
- https://huggingface.co/Helsinki-NLP/opus-mt-en-xh |
|
|
|
#### Corpus: |
|
|
|
- Umsuka English-isiZulu Parallel Corpus (https://zenodo.org/record/5035171#.Yh5NIOhBy3A) |
|
|
|
#### Benchmark: |
|
|
|
| Benchmark | Train | Test | |
|
|-----------|-------|-------| |
|
| Umsuka | 17.61 | 13.73 | |
|
|
|
#### GitHub: |
|
|
|
- https://github.com/umair-nasir14/Geographical-Distance-Is-The-New-Hyperparameter |
|
|
|
#### Citation: |
|
|
|
``` |
|
@article{umair2022geographical, |
|
title={Geographical Distance Is The New Hyperparameter: A Case Study Of Finding The Optimal Pre-trained Language For English-isiZulu Machine Translation}, |
|
author={Umair Nasir, Muhammad and Amos Mchechesi, Innocent}, |
|
journal={arXiv e-prints}, |
|
pages={arXiv--2205}, |
|
year={2022} |
|
} |
|
``` |
|
|
|
|