Kinyarwanda
JoeyNMT
Machine-translation
Kleber commited on
Commit
6bdc28f
1 Parent(s): d72c967

Upload kin_en.md

Browse files
Files changed (1) hide show
  1. kin_en.md +100 -0
kin_en.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Kinyarwanda-to-English Machine Translation
2
+
3
+ This model is an Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211 long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.
4
+
5
+
6
+ ## Model architecture
7
+ **Encoder && Decoder**
8
+ > Type: Transformer
9
+ Num_layer: 6
10
+ Num_heads: 8
11
+ Embedding_dim: 256
12
+ ff_size: 1024
13
+ Dropout: 0.1
14
+ Layer_norm: post
15
+ Initializer: xavier
16
+ Total params: 12563968
17
+
18
+ ## Pre-processing
19
+
20
+ Tokenizer_type: subword-nmt
21
+ num_merges: 4000
22
+ BPE encoding learned on the bitext, separate vocabularies for each language
23
+ Pretokenizer: None
24
+ No lowercase applied
25
+
26
+ ## Training
27
+ Optimizer: Adam
28
+ Loss: crossentropy
29
+ Epochs: 30
30
+ Batch_size: 256
31
+ Number of GPUs: 1
32
+
33
+
34
+
35
+ ## Evaluation
36
+
37
+ Evaluation_metrics: Blue_score, chrf
38
+ Tokenization: None
39
+ Beam_width: 15
40
+ Beam_alpha: 1.0
41
+
42
+ ## Tools
43
+ * joeyNMT 2.0.0
44
+ * datasets
45
+ * pandas
46
+ * numpy
47
+ * transformers
48
+ * sentencepiece
49
+ * pytorch(with cuda)
50
+ * sacrebleu
51
+ * protobuf>=3.20.1
52
+
53
+ ## How to train
54
+
55
+ [Use the following link for more information](https://github.com/joeynmt/joeynmt)
56
+
57
+ ## Translation
58
+ To install joeyNMT run:
59
+ >$ git clone https://github.com/joeynmt/joeynmt.git
60
+ $ cd joeynmt
61
+ $ pip install . -e
62
+
63
+ Interactive translation(stdin):
64
+ >$ python -m joeynmt translate configs/args.yaml
65
+
66
+ File translation:
67
+ >$ python -m joeynmt translate configs/args.yaml < src_lang.txt > hypothesis_trg_lang.txt
68
+
69
+ ## Accuracy measurement
70
+ Sacrebleu installation:
71
+ > $ pip install sacrebleu
72
+
73
+ Measurement(bleu_score, chrf):
74
+ > $ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf
75
+
76
+
77
+ ## To-do
78
+
79
+ >* Test the model using differenct datasets including the jw300
80
+ >* Use the Digital Umuganda dataset on some of the available State Of The Art(SOTA) available models.
81
+ >* Expand the dataset
82
+
83
+ ## Result
84
+ The following result were obtained on using the sacrebleu.
85
+
86
+
87
+ Kinyarwanda-to-English:
88
+ >Blue: 79.87
89
+ >Chrf: 84.40
90
+
91
+
92
+
93
+
94
+
95
+
96
+
97
+
98
+
99
+
100
+