BERT medium (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work Scaling Laws for BERT in Low-Resource Settings at ACL2023 Findings.

The model has 51M parameters (8L), and a vocab size of 50K. It was trained for 500K steps with a sequence length of 512 tokens and batch-size of 256.

Results

bert-base-sw bert-medium-sw Flair mBERT SwahBERT
NERC 92.09 91.63 92.04 91.17 88.60
Topic 93.07 92.88 91.83 91.52 90.90
Sentiment 79.04 77.07 73.60 69.17 71.12
QNLI 63.34 63.87 52.82 63.48 64.72

Authors

Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1], Rodrigo Agerri [2] and Aitor Soroa [2]

Affiliation of the authors:

[1] Orai NLP Technologies

[2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU

Licensing

The model is licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).

To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Acknowledgements

If you use this model please cite the following paper:

  • G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada

Contact information

Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus

Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including orai-nlp/bert-medium-sw