berturk-sunlp-ner-turkish

Introduction

[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset.

Training data

The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset Named entity types are as follows: Person, Location, Organization, Time, Money, Product, TV-Show

How to use berturk-sunlp-ner-turkish with HuggingFace

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish")

Model performances on SUNLP-NER-Twitter test set (metric: seqeval)

Precision	Recall	F1
82.96	82.42	82.69

Classification Report

Entity	Precision	Recall	F1
LOCATION	0.70	0.80	0.74
MONEY	0.80	0.71	0.75
ORGANIZATION	0.78	0.86	0.78
PERSON	0.90	0.91	0.91
PRODUCT	0.44	0.47	0.45
TIME	0.94	0.85	0.89
TVSHOW	0.61	0.35	0.45

You can cite the following paper, if you use this model:

@InProceedings{ark-yeniterzi:2022:LREC,
  author    = {\c{C}ar\i k, Buse  and  Yeniterzi, Reyyan},
  title     = {A Twitter Corpus for Named Entity Recognition in Turkish},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {4546--4551},
  url       = {https://aclanthology.org/2022.lrec-1.484}
}