akdeniz27 commited on
Commit
d087963
·
1 Parent(s): 48f6181

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: sq
3
+ widget:
4
+ - text: "Varianti AY.4.2 është më i lehtë për t'u transmetuar, thotë Francois Balu, drejtor i Institutit të Gjenetikës në Londër."
5
+ ---
6
+ # Albanian Named Entity Recognition (NER) Model
7
+ This model is the fine-tuned model of "bert-base-multilingual-cased"
8
+ using the famous WikiANN dataset presented
9
+ in the "Cross-lingual Name Tagging and Linking for 282 Languages" [paper](https://aclanthology.org/P17-1178.pdf).
10
+ # Fine-tuning parameters:
11
+ ```
12
+ task = "ner"
13
+ model_checkpoint = "bert-base-multilingual-cased"
14
+ batch_size = 8
15
+ label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
16
+ max_length = 512
17
+ learning_rate = 2e-5
18
+ num_train_epochs = 3
19
+ weight_decay = 0.01
20
+ ```
21
+ # How to use:
22
+ ```
23
+ model = AutoModelForTokenClassification.from_pretrained("akdeniz27/mbert-base-albanian-cased-ner")
24
+ tokenizer = AutoTokenizer.from_pretrained("akdeniz27/mbert-base-albanian-cased-ner")
25
+ ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
26
+ ner("<your text here>")
27
+ ```
28
+ Pls refer "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with aggregation_strategy parameter.
29
+ # Reference test results:
30
+ * accuracy: 0.9719268816143276
31
+ * f1: 0.9192366826444787
32
+ * precision: 0.9171629669734704
33
+ * recall: 0.9213197969543148