berturk-sunlp-ner-turkish

Introduction

[berturk-sunlp-ner-turkish] is a NER model that was fine-tuned from the BERTurk-cased model on the SUNLP-NER-Twitter dataset.

Training data

The model was trained on the SUNLP-NER-Twitter dataset (5000 tweets). The dataset can be found at https://github.com/SU-NLP/SUNLP-Twitter-NER-Dataset Named entity types are as follows: Person, Location, Organization, Time, Money, Product, TV-Show

How to use berturk-sunlp-ner-turkish with HuggingFace

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("busecarik/berturk-sunlp-ner-turkish")
model = AutoModelForTokenClassification.from_pretrained("busecarik/berturk-sunlp-ner-turkish")

Model performances on SUNLP-NER-Twitter test set (metric: seqeval)

Precision Recall F1
82.96 82.42 82.69

Classification Report

Entity Precision Recall F1
LOCATION 0.70 0.80 0.74
MONEY 0.80 0.71 0.75
ORGANIZATION 0.78 0.86 0.78
PERSON 0.90 0.91 0.91
PRODUCT 0.44 0.47 0.45
TIME 0.94 0.85 0.89
TVSHOW 0.61 0.35 0.45

You can cite the following paper, if you use this model:

@InProceedings{ark-yeniterzi:2022:LREC,
  author    = {\c{C}ar\i k, Buse  and  Yeniterzi, Reyyan},
  title     = {A Twitter Corpus for Named Entity Recognition in Turkish},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {4546--4551},
  url       = {https://aclanthology.org/2022.lrec-1.484}
}
Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.