DigitalUmuganda
/

Joeynmt-kin-en

Machine-translation

Model card Files Files and versions Community

Joeynmt-kin-en / README.md

rutsam's picture

change the readme file name

9502375 over 2 years ago

|

2.27 kB

	---
	library_name: JoeyNMT
	task: Machine-translation
	tags:
	- JoeyNMT
	- Machine-translation
	language: rw
	datasets:
	- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
	widget:
	- text: "Muraho neza, murakaza neza mu Rwanda."
	example_title: "Muraho neza, murakaza neza mu Rwanda."
	---
	# Kinyarwanda-to-English Machine Translation

	This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.


	## Model architecture
	Encoder && Decoder
	> Type: Transformer
	Num_layer: 6
	Num_heads: 8
	Embedding_dim: 256
	ff_size: 1024
	Dropout: 0.1
	Layer_norm: post
	Initializer: xavier
	Total params: 12563968

	## Pre-processing

	Tokenizer_type: subword-nmt
	num_merges: 4000
	BPE encoding learned on the bitext, separate vocabularies for each language
	Pretokenizer: None
	No lowercase applied

	## Training
	Optimizer: Adam
	Loss: crossentropy
	Epochs: 30
	Batch_size: 256
	Number of GPUs: 1



	## Evaluation

	Evaluation_metrics: Blue_score, chrf
	Tokenization: None
	Beam_width: 15
	Beam_alpha: 1.0

	## Tools
	* joeyNMT 2.0.0
	* datasets
	* pandas
	* numpy
	* transformers
	* sentencepiece
	* pytorch(with cuda)
	* sacrebleu
	* protobuf>=3.20.1

	## How to train

	[Use the following link for more information](https://github.com/joeynmt/joeynmt)

	## Translation
	To install joeyNMT run:
	>$ git clone https://github.com/joeynmt/joeynmt.git
	$ cd joeynmt
	$ pip install . -e

	Interactive translation(stdin):
	>$ python -m joeynmt translate configs/args.yaml

	File translation:
	>$ python -m joeynmt translate configs/args.yaml < src_lang.txt > hypothesis_trg_lang.txt

	## Accuracy measurement
	Sacrebleu installation:
	> $ pip install sacrebleu

	Measurement(bleu_score, chrf):
	> $ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf


	## To-do

	>* Test the model using different datasets including the jw300
	>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
	>* Expand the dataset

	## Result
	The following result was obtained using sacrebleu.


	Kinyarwanda-to-English:
	>Blue: 79.87
	>Chrf: 84.40