--- library_name: JoeyNMT task: Machine-translation tags: - JoeyNMT - Machine-translation language: rw datasets: - DigitalUmuganda/kinyarwanda-english-machine-translation-dataset widget: - text: "Muraho neza, murakaza neza mu Rwanda." example_title: "Muraho neza, murakaza neza mu Rwanda." --- # Kinyarwanda-to-English Machine Translation This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda. ## Model architecture **Encoder && Decoder** > Type: Transformer Num_layer: 6 Num_heads: 8 Embedding_dim: 256 ff_size: 1024 Dropout: 0.1 Layer_norm: post Initializer: xavier Total params: 12563968 ## Pre-processing Tokenizer_type: subword-nmt num_merges: 4000 BPE encoding learned on the bitext, separate vocabularies for each language Pretokenizer: None No lowercase applied ## Training Optimizer: Adam Loss: crossentropy Epochs: 30 Batch_size: 256 Number of GPUs: 1 ## Evaluation Evaluation_metrics: Blue_score, chrf Tokenization: None Beam_width: 15 Beam_alpha: 1.0 ## Tools * joeyNMT 2.0.0 * datasets * pandas * numpy * transformers * sentencepiece * pytorch(with cuda) * sacrebleu * protobuf>=3.20.1 ## How to train [Use the following link for more information](https://github.com/joeynmt/joeynmt) ## Translation To install joeyNMT run: ``` $ git clone https://github.com/joeynmt/joeynmt.git $ cd joeynmt $ pip install . -e ``` Interactive translation(stdin): ``` $ python -m joeynmt translate configs/args.yaml ``` File translation: ``` $ python -m joeynmt translate configs/args.yaml < src_lang.txt > hypothesis_trg_lang.txt ``` ## Accuracy measurement Sacrebleu installation: ``` $ pip install sacrebleu ``` Measurement(bleu_score, chrf): ``` $ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf ``` ## To-do >* Test the model using different datasets including the jw300 >* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models. >* Expand the dataset ## Result The following result was obtained using sacrebleu. Kinyarwanda-to-English: ``` Blue: 79.87 Chrf: 84.40 ```