akdeniz27's picture
Update README.md
85f99ba
|
raw
history blame
1.6 kB
metadata
language: tr
widget:
  - text: >-
      Almanya’nın Mainz kentinde Türk profesör Uğur Şahin ile eşi Özlem
      Türeci’nin kurduğu ve yönettiği biyoteknoloji şirketi BioNTech ile aşı
      sürecini sürdüren Pfizer’ın corona virüsü aşısı üretmeye başladığı
      belirtildi.

Turkish Named Entity Recognition (NER) Model

This model is the fine-tuned model of dbmdz/convbert-base-turkish-cased (ConvBERTurk) using a reviewed version of well known Turkish NER dataset (https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt).

The ConvBERT architecture is presented in the "ConvBERT: Improving BERT with Span-based Dynamic Convolution" paper.

Fine-tuning parameters:

task = "ner"
model_checkpoint = "dbmdz/convbert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01 

How to use:

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/convbert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/convbert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
NER("text")
# Pls refer "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with aggregation_strategy parameter.

Reference test results:

  • accuracy: 0.9937648915431506
  • f1: 0.9610945644080416
  • precision: 0.9619899385131359
  • recall: 0.9602008554956295