sakrah commited on
Commit
51534d7
·
1 Parent(s): 911e2e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -4
README.md CHANGED
@@ -1,11 +1,29 @@
1
  ---
2
- license: openrail
 
3
  ---
 
 
4
  TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
5
- The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size.
 
6
 
7
 
8
 
9
- Limitations:
10
- The model was trained on a very small data (about 5MB), which is very limiting.
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: twi
3
+ license: mit
4
  ---
5
+ ## TwiBERT
6
+ ## Model Description
7
  TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
8
+ The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size. The model
9
+ was trained on the Asanti Twi Bible together with a crowdsourced dataset.
10
 
11
 
12
 
13
+ ## Limitations:
 
14
 
15
+ The model was trained on a very small dataset (about 5MB), which makes it difficult for the model
16
+ to learn complex contextual embeddings that will enable it to generalize. Plus, the scope of the dataset (the bible) might
17
+ give it strong religious bias.
18
+
19
+
20
+ ## How to use it
21
+
22
+ You can finetune TwiBERT by finetuning it on a downtream task.
23
+ The example code below illustrates how you can use the TwiBERT model on a downtream task:
24
+
25
+ ```python
26
+ >>> from transformers import AutoTokenizer, AutoModelForTokenClassification
27
+ >>> model = AutoModelForTokenClassification.from_pretrained("sakrah/twibert")
28
+ >>> tokenizer = AutoTokenizer.from_pretrained("sakrah/twibert")
29
+ ```