Languages:
Source language: English
Source language: isiZulu
Model Details:
model: transformer
Architecture: MarianMT
pre-processing: normalization + SentencePiece
Pre-trained Model:
Corpus:
- Umsuka English-isiZulu Parallel Corpus (https://zenodo.org/record/5035171#.Yh5NIOhBy3A)
Benchmark:
Benchmark | Train | Test |
---|---|---|
Umsuka | 17.61 | 13.73 |
GitHub:
Citation:
@article{umair2022geographical,
title={Geographical Distance Is The New Hyperparameter: A Case Study Of Finding The Optimal Pre-trained Language For English-isiZulu Machine Translation},
author={Umair Nasir, Muhammad and Amos Mchechesi, Innocent},
journal={arXiv e-prints},
pages={arXiv--2205},
year={2022}
}