Update README.md
Browse files
README.md
CHANGED
@@ -52,15 +52,50 @@ It achieves the following results on the evaluation set:
|
|
52 |
|
53 |
## Model description
|
54 |
|
55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
## Intended uses & limitations
|
58 |
|
59 |
-
|
60 |
|
61 |
## Training and evaluation data
|
62 |
|
63 |
-
|
64 |
|
65 |
## Training procedure
|
66 |
|
|
|
52 |
|
53 |
## Model description
|
54 |
|
55 |
+
This model is a coarse-part-of-speech tagger for the Polish language based on sdadas/polish-roberta-base-v2.
|
56 |
+
It support 13 classes representing coarse part of speech):
|
57 |
+
```
|
58 |
+
{
|
59 |
+
0: 'A',
|
60 |
+
1: 'Adv',
|
61 |
+
2: 'Comp',
|
62 |
+
3: 'Conj',
|
63 |
+
4: 'Dig',
|
64 |
+
5: 'Interj',
|
65 |
+
6: 'N',
|
66 |
+
7: 'Num',
|
67 |
+
8: 'Part',
|
68 |
+
9: 'Prep',
|
69 |
+
10: 'Punct',
|
70 |
+
11: 'V',
|
71 |
+
12: 'X'
|
72 |
+
}
|
73 |
+
```
|
74 |
+
Tags meaning is the same as in nkjp1m dataset:
|
75 |
+
|
76 |
+
| Tag | Description in English | Description in Polish | Example in Polish |
|
77 |
+
|-------|----------------------------------|-----------------------------|---------------------------|
|
78 |
+
| A | Adjective | przymiotnik | szybki |
|
79 |
+
| Adv | Adverb | przysłówek | szybko |
|
80 |
+
| Comp | Comparative / Complementizer | stopień porównawczy / spójnik podrzędny | lepszy / że |
|
81 |
+
| Conj | Conjunction | spójnik | i |
|
82 |
+
| Dig | Digit | cyfra | 5, 3 |
|
83 |
+
| Interj| Interjection | wykrzyknik | och! |
|
84 |
+
| N | Noun | rzeczownik | dom |
|
85 |
+
| Num | Numeral | liczebnik | jeden |
|
86 |
+
| Part | Particle | partykuła | by |
|
87 |
+
| Prep | Preposition | przyimek | w |
|
88 |
+
| Punct | Punctuation | interpunkcja | ., !, ? |
|
89 |
+
| V | Verb | czasownik | biegać |
|
90 |
+
| X | Unknown / Other | niesklasyfikowane | xxx |
|
91 |
|
92 |
## Intended uses & limitations
|
93 |
|
94 |
+
Even though we have some nice tools for pos-tagging in polish (http://morfeusz.sgjp.pl/), I needed a pos tagger for polish that could be easily loaded inside the browser. Huggingface supports such functionality and that's why I created this model.
|
95 |
|
96 |
## Training and evaluation data
|
97 |
|
98 |
+
Model was trained on a half of test data of the nkjp1m dataset (~0.5 milion tokens).
|
99 |
|
100 |
## Training procedure
|
101 |
|