sakrah
/

TwiBERT

sakrah commited on Apr 24, 2023

Commit

51534d7

1 Parent(s): 911e2e4

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,11 +1,29 @@
 ---
-license: openrail
 ---
 TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
-The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size.
-Limitations:
-The model was trained on a very small data (about 5MB), which is very limiting.

 ---
+language: twi
+license: mit
 ---
+## TwiBERT
+## Model Description
 TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa.
+The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size. The model
+was trained on the Asanti Twi Bible together with a crowdsourced dataset.
+## Limitations:
+The model was trained on a very small dataset (about 5MB), which makes it difficult for the model
+to learn complex contextual embeddings  that will enable it to generalize. Plus, the scope of the dataset (the bible) might
+give it strong religious bias.
+## How to use it
+You can finetune TwiBERT by finetuning it on a downtream task.
+The example code below illustrates how you can use the TwiBERT model on a downtream task:
+```python
+>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
+>>> model = AutoModelForTokenClassification.from_pretrained("sakrah/twibert")
+>>> tokenizer = AutoTokenizer.from_pretrained("sakrah/twibert")
+```