File size: 1,045 Bytes
58344d9
51534d7
 
58344d9
51534d7
 
911e2e4
51534d7
 
911e2e4
 
 
51534d7
911e2e4
51534d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
language: twi
license: mit
---
## TwiBERT
## Model Description
TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size. The model 
was trained on the Asanti Twi Bible together with a crowdsourced dataset. 



## Limitations:

The model was trained on a very small dataset (about 5MB), which makes it difficult for the model 
to learn complex contextual embeddings  that will enable it to generalize. Plus, the scope of the dataset (the bible) might 
give it strong religious bias. 


## How to use it

You can finetune TwiBERT by finetuning it on a downtream task. 
The example code below illustrates how you can use the TwiBERT model on a downtream task:

```python
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
>>> model = AutoModelForTokenClassification.from_pretrained("sakrah/twibert")
>>> tokenizer = AutoTokenizer.from_pretrained("sakrah/twibert")
```