--- language: ky datasets: - wikiann examples: widget: - text: "Бириккен Улуттар Уюму" example_title: "Sentence_1" - text: "Жусуп Мамай" example_title: "Sentence_2" ---

Kyrgyz Named Entity Recognition

Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language. WARNING: this model is not usable (see metrics below) and is built just as a proof of concept. I'll update the model after cleaning up the Wikiann dataset (`ky` part of it which contains only 100 train/test/valid items) or coming up with a completely new dataset. ## Label ID and its corresponding label name | Label ID | Label Name| | -------- | ----- | | 0 | O | | 1 | B-PER | | 2 | I-PER | | 3 | B-ORG| | 4 | I-ORG | | 5 | B-LOC | | 6 | I-LOC |

Results

| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 | | ---- | -------- | ----- | ---- | ---- | | Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 | | Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 | | Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 | Example ```py from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER") model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER") nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "Жусуп Мамай" ner_results = nlp(example) ner_results ```