File size: 2,293 Bytes
9502375 6bdc28f 6a3dfd4 6bdc28f f0a9c1a 6bdc28f f0a9c1a 6bdc28f f0a9c1a f88a24b f0a9c1a 6bdc28f f0a9c1a f88a24b f0a9c1a 6bdc28f f0a9c1a 6bdc28f f0a9c1a 6bdc28f 6a3dfd4 6bdc28f 6a3dfd4 6bdc28f f0a9c1a 6bdc28f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language: rw
datasets:
- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
widget:
- text: "Muraho neza, murakaza neza mu Rwanda."
example_title: "Muraho neza, murakaza neza mu Rwanda."
---
# Kinyarwanda-to-English Machine Translation
This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.
## Model architecture
**Encoder && Decoder**
> Type: Transformer
Num_layer: 6
Num_heads: 8
Embedding_dim: 256
ff_size: 1024
Dropout: 0.1
Layer_norm: post
Initializer: xavier
Total params: 12563968
## Pre-processing
Tokenizer_type: subword-nmt
num_merges: 4000
BPE encoding learned on the bitext, separate vocabularies for each language
Pretokenizer: None
No lowercase applied
## Training
Optimizer: Adam
Loss: crossentropy
Epochs: 30
Batch_size: 256
Number of GPUs: 1
## Evaluation
Evaluation_metrics: Blue_score, chrf
Tokenization: None
Beam_width: 15
Beam_alpha: 1.0
## Tools
* joeyNMT 2.0.0
* datasets
* pandas
* numpy
* transformers
* sentencepiece
* pytorch(with cuda)
* sacrebleu
* protobuf>=3.20.1
## How to train
[Use the following link for more information](https://github.com/joeynmt/joeynmt)
## Translation
To install joeyNMT run:
```
$ git clone https://github.com/joeynmt/joeynmt.git
$ cd joeynmt
$ pip install . -e
```
Interactive translation(stdin):
```
$ python -m joeynmt translate args.yaml
```
File translation:
```
$ python -m joeynmt translate args.yaml < src_lang.txt > hypothesis_trg_lang.txt
```
## Accuracy measurement
Sacrebleu installation:
```
$ pip install sacrebleu
```
Measurement(bleu_score, chrf):
```
$ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf
```
## To-do
>* Test the model using different datasets including the jw300
>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
>* Expand the dataset
## Result
The following result was obtained using sacrebleu.
Kinyarwanda-to-English:
```
Blue: 79.87
Chrf: 84.40
```
|