eolang commited on
Commit
49908fc
·
1 Parent(s): 3d60e49

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - sw
4
+ license: apache-2.0
5
+ datasets:
6
+ - masakhaner
7
+ pipeline_tag: token-classification
8
+ examples: null
9
+ widget:
10
+ - text: Joe Bidden ni rais wa marekani.
11
+ example_title: Sentence 1
12
+ - text: Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi.
13
+ example_title: Sentence 2
14
+ - text: Mtoto anaweza kupoteza muda kabisa.
15
+ example_title: Sentence 3
16
+ metrics:
17
+ - accuracy
18
+ ---
19
+
20
+ # TUS Named Entity Recognition
21
+
22
+ - **TUS-NER-sw** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance 😀**
23
+ - Finetuned from model: [eolang/SW-v1](https://huggingface.co/eolang/SW-v1)
24
+
25
+ ## Intended uses & limitations
26
+
27
+ #### How to use
28
+
29
+ You can use this model with Transformers *pipeline* for NER.
30
+
31
+ ```python
32
+ from transformers import pipeline
33
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
36
+ model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")
37
+
38
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
39
+ example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"
40
+
41
+ ner_results = nlp(example)
42
+ print(ner_results)
43
+ ```
44
+
45
+ ## Training data
46
+
47
+ This model was fine-tuned on the Swahili Version of the [Masakhane Dataset](https://github.com/masakhane-io/masakhane-ner/tree/main/MasakhaNER2.0/data/swa) from the [MasakhaneNER Project](https://github.com/masakhane-io/masakhane-ner).
48
+ MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages.
49
+ The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.
50
+
51
+
52
+ ## Training procedure
53
+
54
+ This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task.