|
--- |
|
language: |
|
- sw |
|
license: apache-2.0 |
|
datasets: |
|
- wikiann |
|
pipeline_tag: token-classification |
|
examples: null |
|
widget: |
|
- text: Serikali imetangaza hali ya janga katika wilaya 10 za kusini ambazo zimeathiriwa zaidi na dhoruba. |
|
example_title: Sentence_1 |
|
- text: Faida tano za kula samaki wenye mafuta. |
|
example_title: Sentence_2 |
|
- text: Tahadhari yatolewa kuhusu uwezekano wa mlipuko wa Volkano DR Congo. |
|
example_title: Sentence_3 |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
library_name: transformers |
|
--- |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
You can use this model with Transformers *pipeline* for NER. |
|
|
|
```python |
|
from transformers import pipeline |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("eolang/Swahili-NER-BertBase-Cased") |
|
model = AutoModelForTokenClassification.from_pretrained("eolang/Swahili-NER-BertBase-Cased") |
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
example = "Kwa nini Kenya inageukia mazao ya GMO kukabiliana na ukame" |
|
|
|
ner_results = nlp(example) |
|
print(ner_results) |
|
``` |
|
|
|
## Training data |
|
|
|
This model was fine-tuned on the Swahili Version of the WikiAnn dataset for cross-lingual name tagging and linking based on Wikipedia articles in 295 languages |
|
|
|
|
|
## Training procedure |
|
|
|
This model was trained on a single NVIDIA A 5000 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task. |