DigitalUmuganda
/

Joeynmt-kin-en

Machine-translation

Model card Files Files and versions Community

Kleber commited on Jul 25, 2022

Commit

6bdc28f

•

1 Parent(s): d72c967

Upload kin_en.md

Files changed (1) hide show

kin_en.md +100 -0

kin_en.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# Kinyarwanda-to-English Machine Translation
+This model is an Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211 long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.
+## Model architecture
+**Encoder && Decoder**
+>  Type: Transformer
+	Num_layer: 6
+	Num_heads: 8
+	Embedding_dim: 256
+	ff_size: 1024
+	Dropout: 0.1
+	Layer_norm: post
+	Initializer: xavier
+	Total params: 12563968
+## Pre-processing
+	Tokenizer_type: subword-nmt
+	num_merges: 4000
+	BPE encoding learned on the bitext, separate vocabularies for each language
+	Pretokenizer: None
+	No lowercase applied
+## Training
+	Optimizer: Adam
+	Loss: crossentropy
+	Epochs: 30
+	Batch_size: 256
+	Number of GPUs: 1
+## Evaluation
+	Evaluation_metrics: Blue_score, chrf
+	Tokenization: None
+	Beam_width: 15
+	Beam_alpha: 1.0
+## Tools
+	* joeyNMT 2.0.0
+	* datasets
+	* pandas
+	* numpy
+	* transformers
+	* sentencepiece
+	* pytorch(with cuda)
+	* sacrebleu
+	* protobuf>=3.20.1
+## How to train
+[Use the following link for more information](https://github.com/joeynmt/joeynmt)
+## Translation
+To install joeyNMT run:
+>$ git clone https://github.com/joeynmt/joeynmt.git
+$ cd joeynmt
+$ pip install . -e
+Interactive translation(stdin):
+>$ python -m joeynmt translate configs/args.yaml
+File translation:
+>$ python -m joeynmt translate configs/args.yaml < src_lang.txt > hypothesis_trg_lang.txt
+## Accuracy measurement
+Sacrebleu installation:
+> $ pip install sacrebleu
+Measurement(bleu_score, chrf):
+>  $ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf
+## To-do
+>* Test the model using differenct datasets including the jw300
+>* Use the Digital Umuganda dataset on some of the available State Of The Art(SOTA) available models.
+>* Expand the dataset
+## Result
+The following result were obtained on using the sacrebleu.
+Kinyarwanda-to-English:
+>Blue: 79.87
+>Chrf: 84.40