Kinyarwanda
JoeyNMT
Machine-translation
Joeynmt-kin-en / README.md
rutsam's picture
change the readme file name
9502375
|
raw
history blame
2.27 kB
---
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language: rw
datasets:
- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
widget:
- text: "Muraho neza, murakaza neza mu Rwanda."
example_title: "Muraho neza, murakaza neza mu Rwanda."
---
# Kinyarwanda-to-English Machine Translation
This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.
## Model architecture
**Encoder && Decoder**
> Type: Transformer
Num_layer: 6
Num_heads: 8
Embedding_dim: 256
ff_size: 1024
Dropout: 0.1
Layer_norm: post
Initializer: xavier
Total params: 12563968
## Pre-processing
Tokenizer_type: subword-nmt
num_merges: 4000
BPE encoding learned on the bitext, separate vocabularies for each language
Pretokenizer: None
No lowercase applied
## Training
Optimizer: Adam
Loss: crossentropy
Epochs: 30
Batch_size: 256
Number of GPUs: 1
## Evaluation
Evaluation_metrics: Blue_score, chrf
Tokenization: None
Beam_width: 15
Beam_alpha: 1.0
## Tools
* joeyNMT 2.0.0
* datasets
* pandas
* numpy
* transformers
* sentencepiece
* pytorch(with cuda)
* sacrebleu
* protobuf>=3.20.1
## How to train
[Use the following link for more information](https://github.com/joeynmt/joeynmt)
## Translation
To install joeyNMT run:
>$ git clone https://github.com/joeynmt/joeynmt.git
$ cd joeynmt
$ pip install . -e
Interactive translation(stdin):
>$ python -m joeynmt translate configs/args.yaml
File translation:
>$ python -m joeynmt translate configs/args.yaml < src_lang.txt > hypothesis_trg_lang.txt
## Accuracy measurement
Sacrebleu installation:
> $ pip install sacrebleu
Measurement(bleu_score, chrf):
> $ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf
## To-do
>* Test the model using different datasets including the jw300
>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
>* Expand the dataset
## Result
The following result was obtained using sacrebleu.
Kinyarwanda-to-English:
>Blue: 79.87
>Chrf: 84.40