metadata
language: twi
license: mit
TwiBERT
Model Description
TwiBERT is a language model pretrained on the Twi language, the most spoken language in Ghana, West Africa. The model has 61 million parameters, 6 attention heads, 768 hidden units and 3072 feed forward size. The model was trained on the Asanti Twi Bible together with a crowdsourced dataset.
Limitations:
The model was trained on a very small dataset (about 5MB), which makes it difficult for the model to learn complex contextual embeddings that will enable it to generalize. Plus, the scope of the dataset (the bible) might give it strong religious bias.
How to use it
You can finetune TwiBERT by finetuning it on a downtream task. The example code below illustrates how you can use the TwiBERT model on a downtream task:
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
>>> model = AutoModelForTokenClassification.from_pretrained("sakrah/twibert")
>>> tokenizer = AutoTokenizer.from_pretrained("sakrah/twibert")