dchaplinsky's picture
Update README.md
5982492 verified
|
raw
history blame
3.61 kB
metadata
tags:
  - spacy
  - token-classification
language: uk
datasets:
  - ner-uk.2.0
license: mit
model-index:
  - name: uk_ner_web_trf_13class
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.8977982743
          - name: NER Recall
            type: recall
            value: 0.8860666569
          - name: NER F Score
            type: f_score
            value: 0.891893889
widget:
  - text: >-
      Президент Володимир Зеленський пояснив, що наразі діалог із режимом
      Володимира путіна неможливий, адже агресор обрав курс на знищення
      українського народу. За словами Зеленського цей режим РФ виявляє неповагу
      до суверенітету і територіальної цілісності України.

uk_ner_web_trf_13class

Model description

uk_ner_web_trf_13class is a fine-tuned Roberta Large Ukrainian model that is ready to use for Named Entity Recognition and achieves a new SoA performance for the NER task for Ukrainian language.

It has a solid performance and has been trained to recognize thirteen types of entities:

  • ORG — a name of a company, brand, agency, organization, institution (including religious, informal, non-profit), party, people's association, or specific project like a conference, a music band, a TV program, etc. Example: UNESCO.
  • PERS — a person name where person may refer to humans, book characters, or humanoid creatures like vampires, ghosts, mermaids, etc. Example: Marquis de Sade.
  • LOC — a geographical name, including names of districts, villages, cities, states, counties, countries, continents, rivers, lakes, seas, oceans, mountains, etc. Example: Ukraine.
  • MON — a sum of money including the currency. Examples: $40, 1 mln hryvnias.
  • PCT — a percent value including the percent sign or the word "percent". Example: 10%.
  • DATE — a full or incomplete calendar date that may include a century, a year, a month, a day. Examples: last week, 10.12.1999.
  • TIME — a textual or numerical timestamp. Examples: half past six, 18:30.
  • PERIOD — a time period, which may consist of two dates. Examples: a few months, 2014-2015.
  • JOB — a job title. Examples: member of parliament, ophthalmologist.
  • DOC — a unique name of a document, including names of contracts, orders, bills, purchases. Example: procurement contract CW2244226.
  • QUANT — a quantity with the unit of measurement, such as weight, distance, size. Examples: 3 kilograms, a hundred miles.
  • ART (artifact) — a name of a human-made product, like a book, a song, a car, or a sandwich. Examples: Mona Lisa, iPhone.
  • MISC — any other entity not covered in the list above, like nam*s of holidays, websites, battles, wars, sports events, hurricanes, etc. Example: Black Friday.

The model was fine-tuned on the NER-UK 2.0 dataset, released by the lang-uk.

Another transformer-based model trained on 4 classes for the SpaCy is available here.

Citation

TBA

Copyright: Dmytro Chaplynskyi, Mariana Romanyshyn, lang-uk project, 2024