|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- yo |
|
- ha |
|
- ig |
|
- pcm |
|
--- |
|
|
|
|
|
# naija-bert-base |
|
|
|
NaijaBERT was created by pre-training a [BERT model with token dropping](https://aclanthology.org/2022.acl-long.262/) on five Nigerian languages (English, Hausa, Igbo, Naija, and Yoruba) texts for about 100K steps. |
|
It was trained using BERT-base architecture with [Tensorflow Model Garden](https://github.com/tensorflow/models/tree/master/official/projects) |
|
|
|
### Pre-training corpus |
|
A mix of WURA, Wikipedia and MT560 data |
|
|
|
#### How to use |
|
You can use this model with Transformers *pipeline* for masked token prediction. |
|
```python |
|
>>> from transformers import pipeline |
|
>>> unmasker = pipeline('fill-mask', model='Davlan/naija-bert-large') |
|
>>> unmasker("Ọjọ kẹsan-an, [MASK] Kẹjọ ni wọn ri oku Baba") |
|
``` |
|
``` |
|
[{'score': 0.9981744289398193, 'token': 3785, 'token_str': 'osu', 'sequence': 'ojo kesan - an, osu kejo ni won ri oku baba'}, {'score': 0.0015279919607564807, 'token': 3355, 'token_str': 'ojo', 'sequence': 'ojo kesan - an, ojo kejo ni won ri oku baba'}, {'score': 0.0001734074903652072, 'token': 11780, 'token_str': 'osun', 'sequence': 'ojo kesan - an, osun kejo ni won ri oku baba'}, {'score': 9.066923666978255e-05, 'token': 21579, 'token_str': 'oṣu', 'sequence': 'ojo kesan - an, oṣu kejo ni won ri oku baba'}, {'score': 1.816015355871059e-05, 'token': 3387, 'token_str': 'odun', 'sequence': 'ojo kesan - an, odun kejo ni won ri oku baba'}] |
|
``` |
|
|
|
### Acknowledgment |
|
We thank [@stefan-it](https://github.com/stefan-it) for providing the pre-processing and pre-training scripts. Finally, we would like to thank Google Cloud for providing us access to TPU v3-8 through the free cloud credits. Model trained using flax, before converted to pytorch. |
|
|
|
|
|
### BibTeX entry and citation info. |
|
``` |
|
@misc{david_adelani_2025, |
|
author = { David Adelani }, |
|
title = { naija-bert-base (Revision 22c83d8) }, |
|
year = 2025, |
|
url = { https://huggingface.co/Davlan/naija-bert-base }, |
|
doi = { 10.57967/hf/5864 }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |