metadata
language:
- sw
license: apache-2.0
datasets:
- wikiann
pipeline_tag: token-classification
examples: null
widget:
- text: >-
Serikali imetangaza hali ya janga katika wilaya 10 za kusini ambazo
zimeathiriwa zaidi na dhoruba.
example_title: Sentence_1
- text: >-
Shirima anasema, ‘Bob Junior’ pia alikuwa na sifa za kipekee kama vile
kuwa na manyoya mengi zaidi kuliko simba wengine lakini mwenye kuvutia kwa
muonekano na hutulia pale anapopigwa picha.
example_title: Sentence_2
- text: Tahadhari yatolewa kuhusu uwezekano wa mlipuko wa Volkano DR Congo.
example_title: Sentence_3
metrics:
- accuracy
- f1
- precision
- recall
library_name: transformers
Intended uses & limitations
How to use
You can use this model with Transformers pipeline for NER.
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("eolang/Swahili-NER-BertBase-Cased")
model = AutoModelForTokenClassification.from_pretrained("eolang/Swahili-NER-BertBase-Cased")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kwa nini Kenya inageukia mazao ya GMO kukabiliana na ukame"
ner_results = nlp(example)
print(ner_results)
Training data
This model was fine-tuned on the Swahili Version of the WikiAnn dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages
Training procedure
This model was trained on a single NVIDIA A 5000 GPU with recommended hyperparameters from the original BERT paper which trained & evaluated the model on CoNLL-2003 NER task.