oroszgy commited on
Commit
029c58f
·
1 Parent(s): 4663cfa

Update spacy pipeline to 3.5.2

Browse files
README.md CHANGED
@@ -14,72 +14,72 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.9069524307
18
  - name: NER Recall
19
  type: recall
20
- value: 0.9150843882
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.9110002625
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9746877841
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
- value: 0.974400689
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
- value: 0.9452579194
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
- value: 0.9874653143
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
- value: 0.9092736147
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
- value: 0.8681339713
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
- value: 0.976744186
73
  ---
74
  Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_trf` |
79
- | **Version** | `3.5.1` |
80
  | **spaCy** | `>=3.5.0,<3.6.0` |
81
- | **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
82
- | **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
83
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[huBERT base model (cased)](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) (Dávid Márk Nemeskey (SZTAKI-HLT)) |
85
  | **License** | `cc-by-sa-4.0` |
@@ -108,20 +108,20 @@ Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, se
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
- | `SENTS_P` | 97.14 |
112
- | `SENTS_R` | 98.22 |
113
- | `SENTS_F` | 97.67 |
114
- | `TAG_ACC` | 97.47 |
115
- | `POS_ACC` | 97.44 |
116
- | `MORPH_ACC` | 94.53 |
117
- | `MORPH_MICRO_P` | 98.05 |
118
- | `MORPH_MICRO_R` | 97.22 |
119
- | `MORPH_MICRO_F` | 97.63 |
120
- | `LEMMA_ACC` | 98.75 |
121
- | `BOUND_DEP_LAS` | 86.86 |
122
- | `BOUND_DEP_UAS` | 90.98 |
123
- | `DEP_UAS` | 90.93 |
124
- | `DEP_LAS` | 86.81 |
125
- | `ENTS_P` | 90.70 |
126
- | `ENTS_R` | 91.51 |
127
- | `ENTS_F` | 91.10 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.909059294
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.9191279887
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.9140659149
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9817207388
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.979902383
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
+ value: 0.9645403646
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.986030045
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
+ value: 0.9078903297
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
+ value: 0.8674641148
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
+ value: 0.9966555184
73
  ---
74
  Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_trf` |
79
+ | **Version** | `3.5.2` |
80
  | **spaCy** | `>=3.5.0,<3.6.0` |
81
+ | **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
82
+ | **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
83
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[huBERT base model (cased)](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) (Dávid Márk Nemeskey (SZTAKI-HLT)) |
85
  | **License** | `cc-by-sa-4.0` |
 
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
+ | `SENTS_P` | 99.78 |
112
+ | `SENTS_R` | 99.55 |
113
+ | `SENTS_F` | 99.67 |
114
+ | `TAG_ACC` | 98.17 |
115
+ | `POS_ACC` | 97.99 |
116
+ | `MORPH_ACC` | 96.45 |
117
+ | `MORPH_MICRO_P` | 98.67 |
118
+ | `MORPH_MICRO_R` | 98.29 |
119
+ | `MORPH_MICRO_F` | 98.48 |
120
+ | `LEMMA_ACC` | 98.60 |
121
+ | `BOUND_DEP_LAS` | 86.74 |
122
+ | `BOUND_DEP_UAS` | 90.77 |
123
+ | `DEP_UAS` | 90.79 |
124
+ | `DEP_LAS` | 86.75 |
125
+ | `ENTS_P` | 90.91 |
126
+ | `ENTS_R` | 91.91 |
127
+ | `ENTS_F` | 91.41 |
config.cfg CHANGED
@@ -1,8 +1,8 @@
1
  [paths]
2
- tagger_model = "models/hu_core_news_trf-tagger-3.5.0/model-best"
3
- parser_model = "models/hu_core_news_trf-parser-3.5.0/model-best"
4
- ner_model = "models/hu_core_news_trf-ner-3.5.0/model-best"
5
- lemmatizer_lookups = "models/hu_core_news_trf-lookup-lemmatizer-3.5.0"
6
  train = null
7
  dev = null
8
  vectors = null
@@ -14,7 +14,7 @@ gpu_allocator = null
14
 
15
  [nlp]
16
  lang = "hu"
17
- pipeline = ["transformer","senter","tagger","morphologizer","lookup_lemmatizer","trainable_lemmatizer","lemma_smoother","experimental_arc_predicter","experimental_arc_labeler","ner"]
18
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
19
  disabled = []
20
  before_creation = null
@@ -30,30 +30,17 @@ scorer = {"@scorers":"spacy-experimental.biaffine_parser_scorer.v1"}
30
 
31
  [components.experimental_arc_labeler.model]
32
  @architectures = "spacy-experimental.Bilinear.v1"
33
- hidden_width = 128
34
- mixed_precision = false
35
  nO = null
36
  dropout = 0.1
37
  grad_scaler = null
38
 
39
  [components.experimental_arc_labeler.model.tok2vec]
40
- @architectures = "spacy-transformers.Tok2VecTransformer.v3"
41
- name = "SZTAKI-HLT/hubert-base-cc"
42
- mixed_precision = false
43
- pooling = {"@layers":"reduce_mean.v1"}
44
  grad_factor = 1.0
45
-
46
- [components.experimental_arc_labeler.model.tok2vec.get_spans]
47
- @span_getters = "spacy-transformers.strided_spans.v1"
48
- window = 128
49
- stride = 96
50
-
51
- [components.experimental_arc_labeler.model.tok2vec.grad_scaler_config]
52
-
53
- [components.experimental_arc_labeler.model.tok2vec.tokenizer_config]
54
- use_fast = true
55
-
56
- [components.experimental_arc_labeler.model.tok2vec.transformer_config]
57
 
58
  [components.experimental_arc_predicter]
59
  factory = "experimental_arc_predicter"
@@ -61,33 +48,17 @@ scorer = {"@scorers":"spacy-experimental.biaffine_parser_scorer.v1"}
61
 
62
  [components.experimental_arc_predicter.model]
63
  @architectures = "spacy-experimental.PairwiseBilinear.v1"
64
- hidden_width = 256
65
  nO = 1
66
  mixed_precision = false
67
  dropout = 0.1
68
  grad_scaler = null
69
 
70
  [components.experimental_arc_predicter.model.tok2vec]
71
- @architectures = "spacy-transformers.Tok2VecTransformer.v3"
72
- name = "SZTAKI-HLT/hubert-base-cc"
73
- mixed_precision = false
74
- pooling = {"@layers":"reduce_mean.v1"}
75
  grad_factor = 1.0
76
-
77
- [components.experimental_arc_predicter.model.tok2vec.get_spans]
78
- @span_getters = "spacy-transformers.strided_spans.v1"
79
- window = 128
80
- stride = 96
81
-
82
- [components.experimental_arc_predicter.model.tok2vec.grad_scaler_config]
83
-
84
- [components.experimental_arc_predicter.model.tok2vec.tokenizer_config]
85
- use_fast = true
86
-
87
- [components.experimental_arc_predicter.model.tok2vec.transformer_config]
88
-
89
- [components.lemma_smoother]
90
- factory = "hu.lemma_smoother"
91
 
92
  [components.lookup_lemmatizer]
93
  factory = "hu.lookup_lemmatizer"
@@ -145,6 +116,7 @@ stride = 96
145
 
146
  [components.ner.model.tok2vec.tokenizer_config]
147
  use_fast = true
 
148
 
149
  [components.ner.model.tok2vec.transformer_config]
150
 
@@ -193,10 +165,24 @@ top_k = 3
193
  nO = null
194
 
195
  [components.trainable_lemmatizer.model.tok2vec]
196
- @architectures = "spacy-transformers.TransformerListener.v1"
197
- grad_factor = 1.0
198
- upstream = "transformer"
199
  pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
 
201
  [components.transformer]
202
  factory = "transformer"
@@ -217,6 +203,7 @@ stride = 96
217
 
218
  [components.transformer.model.tokenizer_config]
219
  use_fast = true
 
220
 
221
  [components.transformer.model.transformer_config]
222
 
 
1
  [paths]
2
+ tagger_model = "models/hu_core_news_trf-tagger-3.5.2/model-best"
3
+ parser_model = "models/hu_core_news_trf-parser-3.5.2/model-best"
4
+ ner_model = "models/hu_core_news_trf-ner-3.5.2/model-best"
5
+ lemmatizer_lookups = "models/hu_core_news_trf-lookup-lemmatizer-3.5.2"
6
  train = null
7
  dev = null
8
  vectors = null
 
14
 
15
  [nlp]
16
  lang = "hu"
17
+ pipeline = ["transformer","senter","tagger","morphologizer","lookup_lemmatizer","trainable_lemmatizer","experimental_arc_predicter","experimental_arc_labeler","ner"]
18
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
19
  disabled = []
20
  before_creation = null
 
30
 
31
  [components.experimental_arc_labeler.model]
32
  @architectures = "spacy-experimental.Bilinear.v1"
33
+ hidden_width = 256
34
+ mixed_precision = true
35
  nO = null
36
  dropout = 0.1
37
  grad_scaler = null
38
 
39
  [components.experimental_arc_labeler.model.tok2vec]
40
+ @architectures = "spacy-transformers.TransformerListener.v1"
 
 
 
41
  grad_factor = 1.0
42
+ upstream = "transformer"
43
+ pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
44
 
45
  [components.experimental_arc_predicter]
46
  factory = "experimental_arc_predicter"
 
48
 
49
  [components.experimental_arc_predicter.model]
50
  @architectures = "spacy-experimental.PairwiseBilinear.v1"
51
+ hidden_width = 64
52
  nO = 1
53
  mixed_precision = false
54
  dropout = 0.1
55
  grad_scaler = null
56
 
57
  [components.experimental_arc_predicter.model.tok2vec]
58
+ @architectures = "spacy-transformers.TransformerListener.v1"
 
 
 
59
  grad_factor = 1.0
60
+ upstream = "transformer"
61
+ pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  [components.lookup_lemmatizer]
64
  factory = "hu.lookup_lemmatizer"
 
116
 
117
  [components.ner.model.tok2vec.tokenizer_config]
118
  use_fast = true
119
+ model_max_length = 512
120
 
121
  [components.ner.model.tok2vec.transformer_config]
122
 
 
165
  nO = null
166
 
167
  [components.trainable_lemmatizer.model.tok2vec]
168
+ @architectures = "spacy-transformers.Tok2VecTransformer.v3"
169
+ name = "SZTAKI-HLT/hubert-base-cc"
170
+ mixed_precision = false
171
  pooling = {"@layers":"reduce_mean.v1"}
172
+ grad_factor = 1.0
173
+
174
+ [components.trainable_lemmatizer.model.tok2vec.get_spans]
175
+ @span_getters = "spacy-transformers.strided_spans.v1"
176
+ window = 128
177
+ stride = 96
178
+
179
+ [components.trainable_lemmatizer.model.tok2vec.grad_scaler_config]
180
+
181
+ [components.trainable_lemmatizer.model.tok2vec.tokenizer_config]
182
+ use_fast = true
183
+ model_max_length = 512
184
+
185
+ [components.trainable_lemmatizer.model.tok2vec.transformer_config]
186
 
187
  [components.transformer]
188
  factory = "transformer"
 
203
 
204
  [components.transformer.model.tokenizer_config]
205
  use_fast = true
206
+ model_max_length = 512
207
 
208
  [components.transformer.model.transformer_config]
209
 
experimental_arc_labeler/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:013bece7bd7f81ed854dac90e4dd2808b225e99791e41e8539dc73ccf809dbc7
3
- size 447476740
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f48181517669b0533e2304d4837173a99389377a33345adbffc4f87a2e11edd
3
+ size 14947179
experimental_arc_predicter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1943454bb433518bc8782e8bb998e525031765e25445daf654a26d5b255576a6
3
- size 445185700
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2681bfb35605ba7bdf88d320139036210ca9d9458cd7b7c2823b96f188931a4
3
+ size 413192
hu_core_news_trf-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:13c4169de5e3a3bb0f57d60a43108c43c870088e8e62f87f8a254b7c4907d6fe
3
- size 1668889769
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0544eaf5c73bdb41432c55407f92978c15485d995812bb17d55c9c03e1e56553
3
+ size 1266506371
meta.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "lang":"hu",
3
  "name":"core_news_trf",
4
- "version":"3.5.1",
5
  "description":"Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"[email protected]",
@@ -1187,9 +1187,6 @@
1187
  ],
1188
  "lookup_lemmatizer":[
1189
 
1190
- ],
1191
- "lemma_smoother":[
1192
-
1193
  ],
1194
  "experimental_arc_predicter":[
1195
 
@@ -1261,7 +1258,6 @@
1261
  "morphologizer",
1262
  "lookup_lemmatizer",
1263
  "trainable_lemmatizer",
1264
- "lemma_smoother",
1265
  "experimental_arc_predicter",
1266
  "experimental_arc_labeler",
1267
  "ner"
@@ -1273,7 +1269,6 @@
1273
  "morphologizer",
1274
  "lookup_lemmatizer",
1275
  "trainable_lemmatizer",
1276
- "lemma_smoother",
1277
  "experimental_arc_predicter",
1278
  "experimental_arc_labeler",
1279
  "ner"
@@ -1286,232 +1281,222 @@
1286
  "token_p":0.998565417,
1287
  "token_r":0.9993300153,
1288
  "token_f":0.9989475698,
1289
- "sents_p":0.9713656388,
1290
- "sents_r":0.9821826281,
1291
- "sents_f":0.976744186,
1292
- "tag_acc":0.9746877841,
1293
- "pos_acc":0.974400689,
1294
- "morph_acc":0.9452579194,
1295
- "morph_micro_p":0.9804550379,
1296
- "morph_micro_r":0.9722389343,
1297
- "morph_micro_f":0.9763297012,
1298
  "morph_per_feat":{
1299
  "Definite":{
1300
- "p":0.9952584163,
1301
- "r":0.9794680355,
1302
- "f":0.9873000941
1303
  },
1304
  "PronType":{
1305
- "p":0.9706696182,
1306
- "r":0.96799117,
1307
- "f":0.9693285438
1308
  },
1309
  "Case":{
1310
- "p":0.9902739182,
1311
- "r":0.9857735625,
1312
- "f":0.9880186157
1313
  },
1314
  "Degree":{
1315
- "p":0.8964968153,
1316
- "r":0.9367720466,
1317
- "f":0.916192026
1318
  },
1319
  "Number":{
1320
- "p":0.9931380753,
1321
- "r":0.9944695827,
1322
- "f":0.993803383
1323
  },
1324
  "Mood":{
1325
- "p":0.9446290144,
1326
- "r":0.9456762749,
1327
- "f":0.9451523546
1328
  },
1329
  "Person":{
1330
- "p":0.96875,
1331
- "r":0.9942434211,
1332
- "f":0.9813311688
1333
  },
1334
  "Tense":{
1335
- "p":0.9955703212,
1336
- "r":0.9933701657,
1337
- "f":0.9944690265
1338
  },
1339
  "VerbForm":{
1340
- "p":0.9959349593,
1341
- "r":0.7858861267,
1342
- "f":0.8785298073
1343
  },
1344
  "Voice":{
1345
- "p":0.9846782431,
1346
- "r":0.9856850716,
1347
- "f":0.9851814001
1348
  },
1349
  "Number[psor]":{
1350
- "p":0.9957386364,
1351
- "r":0.9985754986,
1352
- "f":0.9971550498
1353
  },
1354
  "Person[psor]":{
1355
- "p":0.9943181818,
1356
- "r":0.9985734665,
1357
- "f":0.9964412811
1358
  },
1359
  "NumType":{
1360
- "p":0.9575471698,
1361
- "r":0.9902439024,
1362
- "f":0.9736211031
1363
- },
1364
- "Poss":{
1365
- "p":0.5,
1366
- "r":1.0,
1367
- "f":0.6666666667
1368
  },
1369
  "Reflex":{
1370
- "p":0.0,
1371
- "r":0.0,
1372
- "f":0.0
1373
- },
1374
- "Reflexive":{
1375
- "p":0.0,
1376
- "r":0.0,
1377
- "f":0.0
1378
  },
1379
  "Aspect":{
1380
  "p":0.0,
1381
  "r":0.0,
1382
  "f":0.0
1383
  },
1384
- "NumType[sem]":{
1385
- "p":0.0,
1386
- "r":0.0,
1387
- "f":0.0
1388
- },
1389
  "Number[psed]":{
1390
  "p":1.0,
1391
- "r":0.7777777778,
1392
- "f":0.875
 
 
 
 
 
1393
  }
1394
  },
1395
- "lemma_acc":0.9874653143,
1396
- "bound_dep_las":0.8686201283,
1397
- "bound_dep_uas":0.9097960356,
1398
- "dep_uas":0.9092736147,
1399
- "dep_las":0.8681339713,
1400
  "dep_las_per_type":{
1401
  "415":{
1402
- "p":0.9512779553,
1403
- "r":0.9482484076,
1404
- "f":0.9497607656
1405
  },
1406
  "7411097074813287689":{
1407
- "p":0.9126290707,
1408
- "r":0.9394930499,
1409
- "f":0.9258662369
1410
  },
1411
  "429":{
1412
- "p":0.9369951535,
1413
- "r":0.90625,
1414
- "f":0.9213661636
1415
  },
1416
  "15861261214731031920":{
1417
- "p":0.7480719794,
1418
- "r":0.7132352941,
1419
- "f":0.730238394
1420
  },
1421
  "991268021520064439":{
1422
- "p":0.8733333333,
1423
- "r":0.8881355932,
1424
- "f":0.8806722689
1425
  },
1426
  "435":{
1427
- "p":0.8789473684,
1428
- "r":0.901890189,
1429
- "f":0.8902709907
1430
  },
1431
  "434":{
1432
- "p":0.9434782609,
1433
- "r":0.9752808989,
1434
- "f":0.9591160221
1435
  },
1436
  "8206900633647566924":{
1437
- "p":0.8568588469,
1438
- "r":0.9599109131,
1439
- "f":0.9054621849
1440
  },
1441
  "407":{
1442
- "p":0.8361344538,
1443
- "r":0.8378947368,
1444
- "f":0.8370136698
1445
  },
1446
  "410":{
1447
- "p":0.7408163265,
1448
- "r":0.75625,
1449
- "f":0.7484536082
1450
  },
1451
  "445":{
1452
- "p":0.8628649016,
1453
- "r":0.8593644354,
1454
- "f":0.8611111111
1455
  },
1456
  "400":{
1457
- "p":0.8383838384,
1458
- "r":0.8736842105,
1459
- "f":0.8556701031
1460
  },
1461
  "17772752594865228322":{
1462
- "p":0.9398148148,
1463
- "r":0.9485981308,
1464
- "f":0.9441860465
1465
  },
1466
  "403":{
1467
- "p":0.7323943662,
1468
- "r":0.5531914894,
1469
- "f":0.6303030303
1470
  },
1471
  "399":{
1472
- "p":0.6037735849,
1473
- "r":0.6530612245,
1474
- "f":0.6274509804
1475
  },
1476
  "3143985677199705895":{
1477
- "p":0.8073770492,
1478
- "r":0.8565217391,
1479
- "f":0.8312236287
1480
  },
1481
  "9241468201421778905":{
1482
- "p":0.4571428571,
1483
  "r":0.4848484848,
1484
- "f":0.4705882353
1485
  },
1486
  "423":{
1487
- "p":0.9371069182,
1488
- "r":0.9430379747,
1489
- "f":0.9400630915
1490
  },
1491
  "13543738850102096385":{
1492
- "p":0.9814814815,
1493
- "r":0.9724770642,
1494
- "f":0.9769585253
1495
  },
1496
  "10901028881100056900":{
1497
- "p":0.7741935484,
1498
  "r":0.75,
1499
- "f":0.7619047619
1500
  },
1501
  "411":{
1502
- "p":0.8611111111,
1503
- "r":0.756097561,
1504
- "f":0.8051948052
1505
  },
1506
  "12549387360942434255":{
1507
- "p":0.4285714286,
1508
  "r":0.45,
1509
- "f":0.4390243902
1510
  },
1511
  "303601073839818384":{
1512
  "p":0.5,
1513
- "r":0.375,
1514
- "f":0.4285714286
1515
  },
1516
  "8884235091647096537":{
1517
  "p":0.0,
@@ -1519,64 +1504,74 @@
1519
  "f":0.0
1520
  },
1521
  "2249809950233855422":{
1522
- "p":0.6363636364,
1523
- "r":0.65625,
1524
- "f":0.6461538462
1525
  },
1526
  "422":{
1527
- "p":0.3076923077,
1528
- "r":0.5333333333,
1529
- "f":0.3902439024
 
 
 
 
 
1530
  },
1531
  "8110129090154140942":{
1532
- "p":0.96875,
1533
- "r":0.9489795918,
1534
- "f":0.9587628866
1535
  },
1536
  "412":{
1537
- "p":0.85,
1538
  "r":0.4594594595,
1539
- "f":0.5964912281
1540
  },
1541
  "436":{
1542
- "p":0.3953488372,
1543
- "r":0.2328767123,
1544
- "f":0.2931034483
1545
  },
1546
  "450":{
1547
- "p":0.9594594595,
1548
  "r":0.9594594595,
1549
- "f":0.9594594595
1550
  },
1551
  "12837356684637874264":{
1552
- "p":0.7777777778,
1553
  "r":0.6021505376,
1554
- "f":0.6787878788
 
 
 
 
 
1555
  },
1556
  "451":{
1557
- "p":0.6,
1558
- "r":0.625,
1559
- "f":0.612244898
1560
  },
1561
  "7349492218059511525":{
1562
- "p":0.8181818182,
1563
- "r":0.9,
1564
- "f":0.8571428571
1565
  },
1566
  "426":{
1567
- "p":0.7142857143,
1568
- "r":0.4545454545,
1569
- "f":0.5555555556
1570
  },
1571
  "405":{
1572
- "p":0.8181818182,
1573
- "r":0.75,
1574
- "f":0.7826086957
1575
  },
1576
  "17865338459503383721":{
1577
  "p":1.0,
1578
- "r":0.3333333333,
1579
- "f":0.5
1580
  },
1581
  "17311980334327143026":{
1582
  "p":0.0,
@@ -1584,39 +1579,29 @@
1584
  "f":0.0
1585
  },
1586
  "7037928807040764755":{
1587
- "p":0.975,
1588
- "r":0.975,
1589
- "f":0.975
1590
- },
1591
- "408":{
1592
- "p":0.0,
1593
- "r":0.0,
1594
- "f":0.0
1595
  },
1596
  "11190527879068114961":{
1597
  "p":0.0,
1598
  "r":0.0,
1599
  "f":0.0
1600
  },
1601
- "3350290345017230236":{
1602
- "p":0.1666666667,
1603
- "r":0.0833333333,
1604
- "f":0.1111111111
1605
- },
1606
  "10069665988847657778":{
1607
  "p":0.0,
1608
  "r":0.0,
1609
  "f":0.0
1610
  },
1611
  "17473201795025412735":{
1612
- "p":1.0,
1613
  "r":0.1666666667,
1614
- "f":0.2857142857
1615
  },
1616
  "6522094215780122214":{
1617
- "p":0.8,
1618
  "r":1.0,
1619
- "f":0.8888888889
1620
  },
1621
  "203073658115086772":{
1622
  "p":0.0,
@@ -1624,32 +1609,32 @@
1624
  "f":0.0
1625
  }
1626
  },
1627
- "ents_p":0.9069524307,
1628
- "ents_r":0.9150843882,
1629
- "ents_f":0.9110002625,
1630
  "ents_per_type":{
1631
  "ORG":{
1632
- "p":0.9245977011,
1633
- "r":0.9323133982,
1634
- "f":0.9284395199
1635
  },
1636
  "PER":{
1637
- "p":0.9425695678,
1638
- "r":0.9510155317,
1639
- "f":0.9467737139
1640
  },
1641
  "LOC":{
1642
- "p":0.9274977896,
1643
- "r":0.9105902778,
1644
- "f":0.9189662724
1645
  },
1646
  "MISC":{
1647
- "p":0.7432795699,
1648
- "r":0.7843971631,
1649
- "f":0.7632850242
1650
  }
1651
  },
1652
- "speed":2397.7104249023
1653
  },
1654
  "sources":[
1655
  {
 
1
  {
2
  "lang":"hu",
3
  "name":"core_news_trf",
4
+ "version":"3.5.2",
5
  "description":"Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"[email protected]",
 
1187
  ],
1188
  "lookup_lemmatizer":[
1189
 
 
 
 
1190
  ],
1191
  "experimental_arc_predicter":[
1192
 
 
1258
  "morphologizer",
1259
  "lookup_lemmatizer",
1260
  "trainable_lemmatizer",
 
1261
  "experimental_arc_predicter",
1262
  "experimental_arc_labeler",
1263
  "ner"
 
1269
  "morphologizer",
1270
  "lookup_lemmatizer",
1271
  "trainable_lemmatizer",
 
1272
  "experimental_arc_predicter",
1273
  "experimental_arc_labeler",
1274
  "ner"
 
1281
  "token_p":0.998565417,
1282
  "token_r":0.9993300153,
1283
  "token_f":0.9989475698,
1284
+ "sents_p":0.9977678571,
1285
+ "sents_r":0.995545657,
1286
+ "sents_f":0.9966555184,
1287
+ "tag_acc":0.9817207388,
1288
+ "pos_acc":0.979902383,
1289
+ "morph_acc":0.9645403646,
1290
+ "morph_micro_p":0.9866706928,
1291
+ "morph_micro_r":0.982939407,
1292
+ "morph_micro_f":0.9848015155,
1293
  "morph_per_feat":{
1294
  "Definite":{
1295
+ "p":0.9815583218,
1296
+ "r":0.9934671022,
1297
+ "f":0.9874768089
1298
  },
1299
  "PronType":{
1300
+ "p":0.9824175824,
1301
+ "r":0.9867549669,
1302
+ "f":0.9845814978
1303
  },
1304
  "Case":{
1305
+ "p":0.9930389817,
1306
+ "r":0.9865639202,
1307
+ "f":0.9897908613
1308
  },
1309
  "Degree":{
1310
+ "p":0.9568593615,
1311
+ "r":0.9226289517,
1312
+ "f":0.9394324439
1313
  },
1314
  "Number":{
1315
+ "p":0.9951178451,
1316
+ "r":0.9906150494,
1317
+ "f":0.9928613421
1318
  },
1319
  "Mood":{
1320
+ "p":0.9834801762,
1321
+ "r":0.9900221729,
1322
+ "f":0.9867403315
1323
  },
1324
  "Person":{
1325
+ "p":0.9819078947,
1326
+ "r":0.9819078947,
1327
+ "f":0.9819078947
1328
  },
1329
  "Tense":{
1330
+ "p":0.9911991199,
1331
+ "r":0.9955801105,
1332
+ "f":0.993384785
1333
  },
1334
  "VerbForm":{
1335
+ "p":0.986852917,
1336
+ "r":0.9631114675,
1337
+ "f":0.9748376623
1338
  },
1339
  "Voice":{
1340
+ "p":0.9806910569,
1341
+ "r":0.9867075665,
1342
+ "f":0.9836901121
1343
  },
1344
  "Number[psor]":{
1345
+ "p":0.9872521246,
1346
+ "r":0.9928774929,
1347
+ "f":0.9900568182
1348
  },
1349
  "Person[psor]":{
1350
+ "p":0.9858356941,
1351
+ "r":0.9928673324,
1352
+ "f":0.9893390192
1353
  },
1354
  "NumType":{
1355
+ "p":0.941031941,
1356
+ "r":0.9341463415,
1357
+ "f":0.9375764994
 
 
 
 
 
1358
  },
1359
  "Reflex":{
1360
+ "p":1.0,
1361
+ "r":0.875,
1362
+ "f":0.9333333333
 
 
 
 
 
1363
  },
1364
  "Aspect":{
1365
  "p":0.0,
1366
  "r":0.0,
1367
  "f":0.0
1368
  },
 
 
 
 
 
1369
  "Number[psed]":{
1370
  "p":1.0,
1371
+ "r":0.3333333333,
1372
+ "f":0.5
1373
+ },
1374
+ "Poss":{
1375
+ "p":1.0,
1376
+ "r":1.0,
1377
+ "f":1.0
1378
  }
1379
  },
1380
+ "lemma_acc":0.986030045,
1381
+ "bound_dep_las":0.8673938002,
1382
+ "bound_dep_uas":0.9076731726,
1383
+ "dep_uas":0.9078903297,
1384
+ "dep_las":0.8674641148,
1385
  "dep_las_per_type":{
1386
  "415":{
1387
+ "p":0.9336455894,
1388
+ "r":0.9522292994,
1389
+ "f":0.942845881
1390
  },
1391
  "7411097074813287689":{
1392
+ "p":0.9124497992,
1393
+ "r":0.9288634505,
1394
+ "f":0.9205834684
1395
  },
1396
  "429":{
1397
+ "p":0.9141965679,
1398
+ "r":0.915625,
1399
+ "f":0.9149102264
1400
  },
1401
  "15861261214731031920":{
1402
+ "p":0.7392344498,
1403
+ "r":0.7573529412,
1404
+ "f":0.7481840194
1405
  },
1406
  "991268021520064439":{
1407
+ "p":0.8758278146,
1408
+ "r":0.8966101695,
1409
+ "f":0.8860971524
1410
  },
1411
  "435":{
1412
+ "p":0.8942222222,
1413
+ "r":0.9054905491,
1414
+ "f":0.8998211091
1415
  },
1416
  "434":{
1417
+ "p":0.952173913,
1418
+ "r":0.9842696629,
1419
+ "f":0.9679558011
1420
  },
1421
  "8206900633647566924":{
1422
+ "p":0.875,
1423
+ "r":0.9510022272,
1424
+ "f":0.9114194237
1425
  },
1426
  "407":{
1427
+ "p":0.8132780083,
1428
+ "r":0.8252631579,
1429
+ "f":0.8192267503
1430
  },
1431
  "410":{
1432
+ "p":0.7398373984,
1433
+ "r":0.7583333333,
1434
+ "f":0.7489711934
1435
  },
1436
  "445":{
1437
+ "p":0.8738127544,
1438
+ "r":0.8708586883,
1439
+ "f":0.8723332205
1440
  },
1441
  "400":{
1442
+ "p":0.8367346939,
1443
+ "r":0.8631578947,
1444
+ "f":0.8497409326
1445
  },
1446
  "17772752594865228322":{
1447
+ "p":0.9663461538,
1448
+ "r":0.9392523364,
1449
+ "f":0.9526066351
1450
  },
1451
  "403":{
1452
+ "p":0.6811594203,
1453
+ "r":0.5,
1454
+ "f":0.5766871166
1455
  },
1456
  "399":{
1457
+ "p":0.5,
1458
+ "r":0.5510204082,
1459
+ "f":0.5242718447
1460
  },
1461
  "3143985677199705895":{
1462
+ "p":0.8091286307,
1463
+ "r":0.847826087,
1464
+ "f":0.8280254777
1465
  },
1466
  "9241468201421778905":{
1467
+ "p":0.4210526316,
1468
  "r":0.4848484848,
1469
+ "f":0.4507042254
1470
  },
1471
  "423":{
1472
+ "p":0.9487179487,
1473
+ "r":0.9367088608,
1474
+ "f":0.9426751592
1475
  },
1476
  "13543738850102096385":{
1477
+ "p":0.9444444444,
1478
+ "r":0.9357798165,
1479
+ "f":0.9400921659
1480
  },
1481
  "10901028881100056900":{
1482
+ "p":0.75,
1483
  "r":0.75,
1484
+ "f":0.75
1485
  },
1486
  "411":{
1487
+ "p":0.8108108108,
1488
+ "r":0.7317073171,
1489
+ "f":0.7692307692
1490
  },
1491
  "12549387360942434255":{
1492
+ "p":0.4864864865,
1493
  "r":0.45,
1494
+ "f":0.4675324675
1495
  },
1496
  "303601073839818384":{
1497
  "p":0.5,
1498
+ "r":0.125,
1499
+ "f":0.2
1500
  },
1501
  "8884235091647096537":{
1502
  "p":0.0,
 
1504
  "f":0.0
1505
  },
1506
  "2249809950233855422":{
1507
+ "p":0.5925925926,
1508
+ "r":0.5,
1509
+ "f":0.5423728814
1510
  },
1511
  "422":{
1512
+ "p":0.4761904762,
1513
+ "r":0.6666666667,
1514
+ "f":0.5555555556
1515
+ },
1516
+ "408":{
1517
+ "p":0.1333333333,
1518
+ "r":0.1538461538,
1519
+ "f":0.1428571429
1520
  },
1521
  "8110129090154140942":{
1522
+ "p":0.9740932642,
1523
+ "r":0.9591836735,
1524
+ "f":0.9665809769
1525
  },
1526
  "412":{
1527
+ "p":0.7083333333,
1528
  "r":0.4594594595,
1529
+ "f":0.5573770492
1530
  },
1531
  "436":{
1532
+ "p":0.4117647059,
1533
+ "r":0.095890411,
1534
+ "f":0.1555555556
1535
  },
1536
  "450":{
1537
+ "p":0.9466666667,
1538
  "r":0.9594594595,
1539
+ "f":0.9530201342
1540
  },
1541
  "12837356684637874264":{
1542
+ "p":0.7466666667,
1543
  "r":0.6021505376,
1544
+ "f":0.6666666667
1545
+ },
1546
+ "3350290345017230236":{
1547
+ "p":0.1666666667,
1548
+ "r":0.0416666667,
1549
+ "f":0.0666666667
1550
  },
1551
  "451":{
1552
+ "p":0.5507246377,
1553
+ "r":0.5277777778,
1554
+ "f":0.5390070922
1555
  },
1556
  "7349492218059511525":{
1557
+ "p":0.625,
1558
+ "r":1.0,
1559
+ "f":0.7692307692
1560
  },
1561
  "426":{
1562
+ "p":1.0,
1563
+ "r":0.3636363636,
1564
+ "f":0.5333333333
1565
  },
1566
  "405":{
1567
+ "p":0.9090909091,
1568
+ "r":0.8333333333,
1569
+ "f":0.8695652174
1570
  },
1571
  "17865338459503383721":{
1572
  "p":1.0,
1573
+ "r":0.1666666667,
1574
+ "f":0.2857142857
1575
  },
1576
  "17311980334327143026":{
1577
  "p":0.0,
 
1579
  "f":0.0
1580
  },
1581
  "7037928807040764755":{
1582
+ "p":0.9756097561,
1583
+ "r":1.0,
1584
+ "f":0.987654321
 
 
 
 
 
1585
  },
1586
  "11190527879068114961":{
1587
  "p":0.0,
1588
  "r":0.0,
1589
  "f":0.0
1590
  },
 
 
 
 
 
1591
  "10069665988847657778":{
1592
  "p":0.0,
1593
  "r":0.0,
1594
  "f":0.0
1595
  },
1596
  "17473201795025412735":{
1597
+ "p":0.2,
1598
  "r":0.1666666667,
1599
+ "f":0.1818181818
1600
  },
1601
  "6522094215780122214":{
1602
+ "p":1.0,
1603
  "r":1.0,
1604
+ "f":1.0
1605
  },
1606
  "203073658115086772":{
1607
  "p":0.0,
 
1609
  "f":0.0
1610
  }
1611
  },
1612
+ "ents_p":0.909059294,
1613
+ "ents_r":0.9191279887,
1614
+ "ents_f":0.9140659149,
1615
  "ents_per_type":{
1616
  "ORG":{
1617
+ "p":0.935604293,
1618
+ "r":0.9295317571,
1619
+ "f":0.9325581395
1620
  },
1621
  "PER":{
1622
+ "p":0.9309551208,
1623
+ "r":0.9665471924,
1624
+ "f":0.9484173505
1625
  },
1626
  "LOC":{
1627
+ "p":0.9361702128,
1628
+ "r":0.9166666667,
1629
+ "f":0.9263157895
1630
  },
1631
  "MISC":{
1632
+ "p":0.7398921833,
1633
+ "r":0.7787234043,
1634
+ "f":0.7588113338
1635
  }
1636
  },
1637
+ "speed":3095.7537591888
1638
  },
1639
  "sources":[
1640
  {
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:faa0e827a09123347ae604288a765268ab410cdce68e8310894ffd6be9838161
3
  size 3522673
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1669c374c94ccfa4774c89cde15cfb3acdf4fa0a42dd5d991a40f9b45c3f6f0a
3
  size 3522673
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ef597c4b89292feb19339c4cd5ed1e53a84c4e793296b7b4834f23dbaf5836b
3
- size 443626222
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ea595d55219434c3cf1be3617618b79af0b7bd4128671b43f7f0cbaac20b3d1
3
+ size 443884420
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4a3d7eedee9c2aa804251e45353078de93b816b475dea6700826d3c9bb799e5f
3
  size 6792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f582a0d0c0fedeb0588bcbfcf56ddb4aa73126efb9f39233bb088fc9f64d7b0
3
  size 6792
tagger/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87db3e90005a834ee1a87af6c536d8f5535f93c2e1a1bfd86b632288a70fd877
3
  size 52932
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f3aaec946ca56a88eb57e323209fcdf655a614585449d54561523dcb23d02fa
3
  size 52932
trainable_lemmatizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e619d0c826a82e4acaf0b70ff34d5e8194e8404f5413320fbed4b83917e6b8b3
3
- size 12356945
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cf14b4f1965a19970f0c21a320da12c04cd510e96c5beea7360f873de9c9744
3
+ size 455959169
transformer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:73764adc8399fc4bd9a44c463254f704899ff6e5a5588fdd8a6f661fcfc61647
3
- size 443344022
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69c6dc9812f2080e1bf63506e972590b295335660867186711be369539fd3fa2
3
+ size 443602220
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:470543f06fa1bae827076c99734ecd927ec185ccec73a2ecb6ff44c8a28b3b55
3
- size 6399835
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a5f80767d8b4723935240536536bfb543f502da5daa2fdb6eb6b83072758b28
3
+ size 6399388