dchaplinsky
commited on
Commit
•
978a264
1
Parent(s):
00c3adc
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,54 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
tags:
|
3 |
+
- spacy
|
4 |
+
- token-classification
|
5 |
+
language: uk
|
6 |
+
datasets:
|
7 |
+
- ner-uk.2.0
|
8 |
license: mit
|
9 |
+
model-index:
|
10 |
+
- name: uk_ner_web_trf_13class
|
11 |
+
results:
|
12 |
+
- task:
|
13 |
+
name: NER
|
14 |
+
type: token-classification
|
15 |
+
metrics:
|
16 |
+
- name: NER Precision
|
17 |
+
type: precision
|
18 |
+
value: 0.8977982743
|
19 |
+
- name: NER Recall
|
20 |
+
type: recall
|
21 |
+
value: 0.8860666569
|
22 |
+
- name: NER F Score
|
23 |
+
type: f_score
|
24 |
+
value: 0.891893889
|
25 |
+
widget:
|
26 |
+
- text: "Президент Володимир Зеленський пояснив, що наразі діалог із режимом Володимира путіна неможливий, адже агресор обрав курс на знищення українського народу. За словами Зеленського цей режим РФ виявляє неповагу до суверенітету і територіальної цілісності України."
|
27 |
---
|
28 |
+
# uk_ner_web_trf_13class
|
29 |
+
|
30 |
+
## Model description
|
31 |
+
|
32 |
+
**uk_ner_web_trf_13class** is a fine-tuned [Roberta Large Ukrainian model](https://huggingface.co/benjamin/roberta-large-wechsel-ukrainian) that is ready to use for **Named Entity Recognition** and achieves a new **SoA** performance for the NER task for Ukrainian language.
|
33 |
+
|
34 |
+
It has a solid performance and has been trained to recognize **thirteen** types of entities:
|
35 |
+
- **ORG** — a name of a company, brand, agency, organization, institution (including religious, informal, non-profit), party, people's association, or specific project like a conference, a music band, a TV program, etc. Example: *UNESCO*.
|
36 |
+
- **PERS** — a person name where person may refer to humans, book characters, or humanoid creatures like vampires, ghosts, mermaids, etc. Example: *Marquis de Sade*.
|
37 |
+
- **LOC** — a geographical name, including names of districts, villages, cities, states, counties, countries, continents, rivers, lakes, seas, oceans, mountains, etc. Example: *Ukraine*.
|
38 |
+
- **MON** — a sum of money including the currency. Examples: *\$40, 1 mln hryvnias*.
|
39 |
+
- **PCT** — a percent value including the percent sign or the word "percent". Example: *10\%*.
|
40 |
+
- **DATE** — a full or incomplete calendar date that may include a century, a year, a month, a day. Examples: *last week, 10.12.1999*.
|
41 |
+
- **TIME** — a textual or numerical timestamp. Examples: *half past six, 18:30*.
|
42 |
+
- **PERIOD** — a time period, which may consist of two dates. Examples: *a few months, 2014-2015*.
|
43 |
+
- **JOB** — a job title. Examples: *member of parliament, ophthalmologist*.
|
44 |
+
- **DOC** — a unique name of a document, including names of contracts, orders, bills, purchases. Example: *procurement contract CW2244226*.
|
45 |
+
- **QUANT** — a quantity with the unit of measurement, such as weight, distance, size. Examples: *3 kilograms, a hundred miles*.
|
46 |
+
- **ART** (artifact) — a name of a human-made product, like a book, a song, a car, or a sandwich. Examples: *Mona Lisa, iPhone*.
|
47 |
+
- **MISC** — any other entity not covered in the list above, like nam*s of holidays, websites, battles, wars, sports events, hurricanes, etc. Example: *Black Friday*.
|
48 |
+
|
49 |
+
The model was fine-tuned on the [NER-UK 2.0 dataset](https://github.com/lang-uk/ner-uk), released by the [lang-uk](https://lang.org.ua).
|
50 |
+
|
51 |
+
Another transformer-based model **trained on 4 classes** for the SpaCy is available [here](https://huggingface.co/dchaplinsky/uk_ner_web_trf_best).
|
52 |
+
|
53 |
+
|
54 |
+
Copyright: [Dmytro Chaplynskyi](https://twitter.com/dchaplinsky), [Mariana Romanyshyn](https://scholar.google.com/citations?user=yji2ZvIAAAAJ&hl=uk&oi=ao), [lang-uk project](https://lang.org.ua), 2024
|