|
--- |
|
language: |
|
- sw |
|
license: apache-2.0 |
|
datasets: |
|
- masakhaner |
|
pipeline_tag: token-classification |
|
examples: null |
|
widget: |
|
- text: Joe Bidden ni rais wa marekani. |
|
example_title: Sentence 1 |
|
- text: Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi. |
|
example_title: Sentence 2 |
|
- text: Mtoto anaweza kupoteza muda kabisa. |
|
example_title: Sentence 3 |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# Swahili Named Entity Recognition |
|
|
|
- **TUS-NER-sw** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance 😀** |
|
- Finetuned from model: [eolang/SW-v1](https://huggingface.co/eolang/SW-v1) |
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
You can use this model with Transformers *pipeline* for NER. |
|
|
|
```python |
|
from transformers import pipeline |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1") |
|
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1") |
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi" |
|
|
|
ner_results = nlp(example) |
|
print(ner_results) |
|
``` |
|
|
|
## Training data |
|
|
|
This model was fine-tuned on the Swahili Version of the [Masakhane Dataset](https://github.com/masakhane-io/masakhane-ner/tree/main/MasakhaNER2.0/data/swa) from the [MasakhaneNER Project](https://github.com/masakhane-io/masakhane-ner). |
|
MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. |
|
The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá. |
|
|
|
|
|
## Training procedure |
|
|
|
This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805). |