wkaminski commited on
Commit
4779f2e
1 Parent(s): fe5d391

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -3
README.md CHANGED
@@ -52,15 +52,50 @@ It achieves the following results on the evaluation set:
52
 
53
  ## Model description
54
 
55
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ## Intended uses & limitations
58
 
59
- More information needed
60
 
61
  ## Training and evaluation data
62
 
63
- More information needed
64
 
65
  ## Training procedure
66
 
 
52
 
53
  ## Model description
54
 
55
+ This model is a coarse-part-of-speech tagger for the Polish language based on sdadas/polish-roberta-base-v2.
56
+ It support 13 classes representing coarse part of speech):
57
+ ```
58
+ {
59
+ 0: 'A',
60
+ 1: 'Adv',
61
+ 2: 'Comp',
62
+ 3: 'Conj',
63
+ 4: 'Dig',
64
+ 5: 'Interj',
65
+ 6: 'N',
66
+ 7: 'Num',
67
+ 8: 'Part',
68
+ 9: 'Prep',
69
+ 10: 'Punct',
70
+ 11: 'V',
71
+ 12: 'X'
72
+ }
73
+ ```
74
+ Tags meaning is the same as in nkjp1m dataset:
75
+
76
+ | Tag | Description in English | Description in Polish | Example in Polish |
77
+ |-------|----------------------------------|-----------------------------|---------------------------|
78
+ | A | Adjective | przymiotnik | szybki |
79
+ | Adv | Adverb | przysłówek | szybko |
80
+ | Comp | Comparative / Complementizer | stopień porównawczy / spójnik podrzędny | lepszy / że |
81
+ | Conj | Conjunction | spójnik | i |
82
+ | Dig | Digit | cyfra | 5, 3 |
83
+ | Interj| Interjection | wykrzyknik | och! |
84
+ | N | Noun | rzeczownik | dom |
85
+ | Num | Numeral | liczebnik | jeden |
86
+ | Part | Particle | partykuła | by |
87
+ | Prep | Preposition | przyimek | w |
88
+ | Punct | Punctuation | interpunkcja | ., !, ? |
89
+ | V | Verb | czasownik | biegać |
90
+ | X | Unknown / Other | niesklasyfikowane | xxx |
91
 
92
  ## Intended uses & limitations
93
 
94
+ Even though we have some nice tools for pos-tagging in polish (http://morfeusz.sgjp.pl/), I needed a pos tagger for polish that could be easily loaded inside the browser. Huggingface supports such functionality and that's why I created this model.
95
 
96
  ## Training and evaluation data
97
 
98
+ Model was trained on a half of test data of the nkjp1m dataset (~0.5 milion tokens).
99
 
100
  ## Training procedure
101