Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,29 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
---
|
|
|
|
|
4 |
TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
|
5 |
-
The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size.
|
|
|
6 |
|
7 |
|
8 |
|
9 |
-
Limitations:
|
10 |
-
The model was trained on a very small data (about 5MB), which is very limiting.
|
11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: twi
|
3 |
+
license: mit
|
4 |
---
|
5 |
+
## TwiBERT
|
6 |
+
## Model Description
|
7 |
TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
|
8 |
+
The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size. The model
|
9 |
+
was trained on the Asanti Twi Bible together with a crowdsourced dataset.
|
10 |
|
11 |
|
12 |
|
13 |
+
## Limitations:
|
|
|
14 |
|
15 |
+
The model was trained on a very small dataset (about 5MB), which makes it difficult for the model
|
16 |
+
to learn complex contextual embeddings that will enable it to generalize. Plus, the scope of the dataset (the bible) might
|
17 |
+
give it strong religious bias.
|
18 |
+
|
19 |
+
|
20 |
+
## How to use it
|
21 |
+
|
22 |
+
You can finetune TwiBERT by finetuning it on a downtream task.
|
23 |
+
The example code below illustrates how you can use the TwiBERT model on a downtream task:
|
24 |
+
|
25 |
+
```python
|
26 |
+
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
|
27 |
+
>>> model = AutoModelForTokenClassification.from_pretrained("sakrah/twibert")
|
28 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("sakrah/twibert")
|
29 |
+
```
|