Update spacy pipeline to 3.5.0
Browse files- README.md +31 -31
- config.cfg +5 -4
- edit_tree_lemmatizer.py +465 -0
- experimental_arc_labeler/model +2 -2
- experimental_arc_predicter/model +2 -2
- hu_core_news_trf-any-py3-none-any.whl +2 -2
- lemma_postprocessing.py +113 -0
- lookup_lemmatizer.py +132 -0
- meta.json +186 -186
- morphologizer/model +1 -1
- ner/model +2 -2
- senter/model +1 -1
- tagger/model +1 -1
- tokenizer +1 -1
- trainable_lemmatizer/model +1 -1
- transformer/model +2 -2
- vocab/strings.json +2 -2
README.md
CHANGED
@@ -14,70 +14,70 @@ model-index:
|
|
14 |
metrics:
|
15 |
- name: NER Precision
|
16 |
type: precision
|
17 |
-
value: 0.
|
18 |
- name: NER Recall
|
19 |
type: recall
|
20 |
-
value: 0.
|
21 |
- name: NER F Score
|
22 |
type: f_score
|
23 |
-
value: 0.
|
24 |
- task:
|
25 |
name: TAG
|
26 |
type: token-classification
|
27 |
metrics:
|
28 |
- name: TAG (XPOS) Accuracy
|
29 |
type: accuracy
|
30 |
-
value: 0.
|
31 |
- task:
|
32 |
name: POS
|
33 |
type: token-classification
|
34 |
metrics:
|
35 |
- name: POS (UPOS) Accuracy
|
36 |
type: accuracy
|
37 |
-
value: 0.
|
38 |
- task:
|
39 |
name: MORPH
|
40 |
type: token-classification
|
41 |
metrics:
|
42 |
- name: Morph (UFeats) Accuracy
|
43 |
type: accuracy
|
44 |
-
value: 0.
|
45 |
- task:
|
46 |
name: LEMMA
|
47 |
type: token-classification
|
48 |
metrics:
|
49 |
- name: Lemma Accuracy
|
50 |
type: accuracy
|
51 |
-
value: 0.
|
52 |
- task:
|
53 |
name: UNLABELED_DEPENDENCIES
|
54 |
type: token-classification
|
55 |
metrics:
|
56 |
- name: Unlabeled Attachment Score (UAS)
|
57 |
type: f_score
|
58 |
-
value: 0.
|
59 |
- task:
|
60 |
name: LABELED_DEPENDENCIES
|
61 |
type: token-classification
|
62 |
metrics:
|
63 |
- name: Labeled Attachment Score (LAS)
|
64 |
type: f_score
|
65 |
-
value: 0.
|
66 |
- task:
|
67 |
name: SENTS
|
68 |
type: token-classification
|
69 |
metrics:
|
70 |
- name: Sentences F-Score
|
71 |
type: f_score
|
72 |
-
value: 0.
|
73 |
---
|
74 |
-
Hungarian transformer pipeline (
|
75 |
|
76 |
| Feature | Description |
|
77 |
| --- | --- |
|
78 |
| **Name** | `hu_core_news_trf` |
|
79 |
-
| **Version** | `3.
|
80 |
-
| **spaCy** | `>=3.
|
81 |
| **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
|
82 |
| **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
|
83 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
@@ -104,24 +104,24 @@ Hungarian transformer pipeline (huBert) for HuSpaCy. Components: transformer, se
|
|
104 |
|
105 |
| Type | Score |
|
106 |
| --- | --- |
|
107 |
-
| `TOKEN_ACC` |
|
108 |
| `TOKEN_P` | 99.86 |
|
109 |
| `TOKEN_R` | 99.93 |
|
110 |
| `TOKEN_F` | 99.89 |
|
111 |
-
| `SENTS_P` |
|
112 |
-
| `SENTS_R` |
|
113 |
-
| `SENTS_F` |
|
114 |
-
| `TAG_ACC` | 97.
|
115 |
-
| `POS_ACC` | 97.
|
116 |
-
| `MORPH_ACC` | 94.
|
117 |
-
| `MORPH_MICRO_P` |
|
118 |
-
| `MORPH_MICRO_R` | 97.
|
119 |
-
| `MORPH_MICRO_F` | 97.
|
120 |
-
| `LEMMA_ACC` | 98.
|
121 |
-
| `BOUND_DEP_LAS` |
|
122 |
-
| `BOUND_DEP_UAS` |
|
123 |
-
| `DEP_UAS` |
|
124 |
-
| `DEP_LAS` |
|
125 |
-
| `ENTS_P` | 90.
|
126 |
-
| `ENTS_R` | 91.
|
127 |
-
| `ENTS_F` | 91.
|
|
|
14 |
metrics:
|
15 |
- name: NER Precision
|
16 |
type: precision
|
17 |
+
value: 0.9069524307
|
18 |
- name: NER Recall
|
19 |
type: recall
|
20 |
+
value: 0.9150843882
|
21 |
- name: NER F Score
|
22 |
type: f_score
|
23 |
+
value: 0.9110002625
|
24 |
- task:
|
25 |
name: TAG
|
26 |
type: token-classification
|
27 |
metrics:
|
28 |
- name: TAG (XPOS) Accuracy
|
29 |
type: accuracy
|
30 |
+
value: 0.9746877841
|
31 |
- task:
|
32 |
name: POS
|
33 |
type: token-classification
|
34 |
metrics:
|
35 |
- name: POS (UPOS) Accuracy
|
36 |
type: accuracy
|
37 |
+
value: 0.974400689
|
38 |
- task:
|
39 |
name: MORPH
|
40 |
type: token-classification
|
41 |
metrics:
|
42 |
- name: Morph (UFeats) Accuracy
|
43 |
type: accuracy
|
44 |
+
value: 0.9452579194
|
45 |
- task:
|
46 |
name: LEMMA
|
47 |
type: token-classification
|
48 |
metrics:
|
49 |
- name: Lemma Accuracy
|
50 |
type: accuracy
|
51 |
+
value: 0.9874653143
|
52 |
- task:
|
53 |
name: UNLABELED_DEPENDENCIES
|
54 |
type: token-classification
|
55 |
metrics:
|
56 |
- name: Unlabeled Attachment Score (UAS)
|
57 |
type: f_score
|
58 |
+
value: 0.9092736147
|
59 |
- task:
|
60 |
name: LABELED_DEPENDENCIES
|
61 |
type: token-classification
|
62 |
metrics:
|
63 |
- name: Labeled Attachment Score (LAS)
|
64 |
type: f_score
|
65 |
+
value: 0.8681339713
|
66 |
- task:
|
67 |
name: SENTS
|
68 |
type: token-classification
|
69 |
metrics:
|
70 |
- name: Sentences F-Score
|
71 |
type: f_score
|
72 |
+
value: 0.976744186
|
73 |
---
|
74 |
+
Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner
|
75 |
|
76 |
| Feature | Description |
|
77 |
| --- | --- |
|
78 |
| **Name** | `hu_core_news_trf` |
|
79 |
+
| **Version** | `3.5.0` |
|
80 |
+
| **spaCy** | `>=3.5.0,<3.6.0` |
|
81 |
| **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
|
82 |
| **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
|
83 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
|
|
104 |
|
105 |
| Type | Score |
|
106 |
| --- | --- |
|
107 |
+
| `TOKEN_ACC` | 99.99 |
|
108 |
| `TOKEN_P` | 99.86 |
|
109 |
| `TOKEN_R` | 99.93 |
|
110 |
| `TOKEN_F` | 99.89 |
|
111 |
+
| `SENTS_P` | 97.14 |
|
112 |
+
| `SENTS_R` | 98.22 |
|
113 |
+
| `SENTS_F` | 97.67 |
|
114 |
+
| `TAG_ACC` | 97.47 |
|
115 |
+
| `POS_ACC` | 97.44 |
|
116 |
+
| `MORPH_ACC` | 94.53 |
|
117 |
+
| `MORPH_MICRO_P` | 98.05 |
|
118 |
+
| `MORPH_MICRO_R` | 97.22 |
|
119 |
+
| `MORPH_MICRO_F` | 97.63 |
|
120 |
+
| `LEMMA_ACC` | 98.75 |
|
121 |
+
| `BOUND_DEP_LAS` | 86.86 |
|
122 |
+
| `BOUND_DEP_UAS` | 90.98 |
|
123 |
+
| `DEP_UAS` | 90.93 |
|
124 |
+
| `DEP_LAS` | 86.81 |
|
125 |
+
| `ENTS_P` | 90.70 |
|
126 |
+
| `ENTS_R` | 91.51 |
|
127 |
+
| `ENTS_F` | 91.10 |
|
config.cfg
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
[paths]
|
2 |
-
tagger_model = "models/hu_core_news_trf-tagger-3.
|
3 |
-
parser_model = "models/hu_core_news_trf-parser-3.
|
4 |
-
ner_model = "models/hu_core_news_trf-ner-3.
|
5 |
-
lemmatizer_lookups = "models/hu_core_news_trf-lookup-lemmatizer-3.
|
6 |
train = null
|
7 |
dev = null
|
8 |
vectors = null
|
@@ -252,6 +252,7 @@ annotating_components = []
|
|
252 |
dev_corpus = "corpora.dev"
|
253 |
train_corpus = "corpora.train"
|
254 |
before_to_disk = null
|
|
|
255 |
|
256 |
[training.batcher]
|
257 |
@batchers = "spacy.batch_by_words.v1"
|
|
|
1 |
[paths]
|
2 |
+
tagger_model = "models/hu_core_news_trf-tagger-3.5.0/model-best"
|
3 |
+
parser_model = "models/hu_core_news_trf-parser-3.5.0/model-best"
|
4 |
+
ner_model = "models/hu_core_news_trf-ner-3.5.0/model-best"
|
5 |
+
lemmatizer_lookups = "models/hu_core_news_trf-lookup-lemmatizer-3.5.0"
|
6 |
train = null
|
7 |
dev = null
|
8 |
vectors = null
|
|
|
252 |
dev_corpus = "corpora.dev"
|
253 |
train_corpus = "corpora.train"
|
254 |
before_to_disk = null
|
255 |
+
before_update = null
|
256 |
|
257 |
[training.batcher]
|
258 |
@batchers = "spacy.batch_by_words.v1"
|
edit_tree_lemmatizer.py
ADDED
@@ -0,0 +1,465 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from functools import lru_cache
|
2 |
+
|
3 |
+
from typing import cast, Any, Callable, Dict, Iterable, List, Optional
|
4 |
+
from typing import Sequence, Tuple, Union
|
5 |
+
from collections import Counter
|
6 |
+
from copy import deepcopy
|
7 |
+
from itertools import islice
|
8 |
+
import numpy as np
|
9 |
+
|
10 |
+
import srsly
|
11 |
+
from thinc.api import Config, Model, SequenceCategoricalCrossentropy, NumpyOps
|
12 |
+
from thinc.types import Floats2d, Ints2d
|
13 |
+
|
14 |
+
from spacy.pipeline._edit_tree_internals.edit_trees import EditTrees
|
15 |
+
from spacy.pipeline._edit_tree_internals.schemas import validate_edit_tree
|
16 |
+
from spacy.pipeline.lemmatizer import lemmatizer_score
|
17 |
+
from spacy.pipeline.trainable_pipe import TrainablePipe
|
18 |
+
from spacy.errors import Errors
|
19 |
+
from spacy.language import Language
|
20 |
+
from spacy.tokens import Doc, Token
|
21 |
+
from spacy.training import Example, validate_examples, validate_get_examples
|
22 |
+
from spacy.vocab import Vocab
|
23 |
+
from spacy import util
|
24 |
+
|
25 |
+
|
26 |
+
TOP_K_GUARDRAIL = 20
|
27 |
+
|
28 |
+
|
29 |
+
default_model_config = """
|
30 |
+
[model]
|
31 |
+
@architectures = "spacy.Tagger.v2"
|
32 |
+
|
33 |
+
[model.tok2vec]
|
34 |
+
@architectures = "spacy.HashEmbedCNN.v2"
|
35 |
+
pretrained_vectors = null
|
36 |
+
width = 96
|
37 |
+
depth = 4
|
38 |
+
embed_size = 2000
|
39 |
+
window_size = 1
|
40 |
+
maxout_pieces = 3
|
41 |
+
subword_features = true
|
42 |
+
"""
|
43 |
+
DEFAULT_EDIT_TREE_LEMMATIZER_MODEL = Config().from_str(default_model_config)["model"]
|
44 |
+
|
45 |
+
|
46 |
+
@Language.factory(
|
47 |
+
"trainable_lemmatizer_v2",
|
48 |
+
assigns=["token.lemma"],
|
49 |
+
requires=[],
|
50 |
+
default_config={
|
51 |
+
"model": DEFAULT_EDIT_TREE_LEMMATIZER_MODEL,
|
52 |
+
"backoff": "orth",
|
53 |
+
"min_tree_freq": 3,
|
54 |
+
"overwrite": False,
|
55 |
+
"top_k": 1,
|
56 |
+
"overwrite_labels": True,
|
57 |
+
"scorer": {"@scorers": "spacy.lemmatizer_scorer.v1"},
|
58 |
+
},
|
59 |
+
default_score_weights={"lemma_acc": 1.0},
|
60 |
+
)
|
61 |
+
def make_edit_tree_lemmatizer(
|
62 |
+
nlp: Language,
|
63 |
+
name: str,
|
64 |
+
model: Model,
|
65 |
+
backoff: Optional[str],
|
66 |
+
min_tree_freq: int,
|
67 |
+
overwrite: bool,
|
68 |
+
top_k: int,
|
69 |
+
overwrite_labels: bool,
|
70 |
+
scorer: Optional[Callable],
|
71 |
+
):
|
72 |
+
"""Construct an EditTreeLemmatizer component."""
|
73 |
+
return EditTreeLemmatizer(
|
74 |
+
nlp.vocab,
|
75 |
+
model,
|
76 |
+
name,
|
77 |
+
backoff=backoff,
|
78 |
+
min_tree_freq=min_tree_freq,
|
79 |
+
overwrite=overwrite,
|
80 |
+
top_k=top_k,
|
81 |
+
overwrite_labels=overwrite_labels,
|
82 |
+
scorer=scorer,
|
83 |
+
)
|
84 |
+
|
85 |
+
|
86 |
+
# _f = open("lemmatizer.log", "w")
|
87 |
+
# def debug(*args):
|
88 |
+
# _f.write(" ".join(args) + "\n")
|
89 |
+
def debug(*args):
|
90 |
+
pass
|
91 |
+
|
92 |
+
|
93 |
+
class EditTreeLemmatizer(TrainablePipe):
|
94 |
+
"""
|
95 |
+
Lemmatizer that lemmatizes each word using a predicted edit tree.
|
96 |
+
"""
|
97 |
+
|
98 |
+
def __init__(
|
99 |
+
self,
|
100 |
+
vocab: Vocab,
|
101 |
+
model: Model,
|
102 |
+
name: str = "trainable_lemmatizer",
|
103 |
+
*,
|
104 |
+
backoff: Optional[str] = "orth",
|
105 |
+
min_tree_freq: int = 3,
|
106 |
+
overwrite: bool = False,
|
107 |
+
top_k: int = 1,
|
108 |
+
overwrite_labels,
|
109 |
+
scorer: Optional[Callable] = lemmatizer_score,
|
110 |
+
):
|
111 |
+
"""
|
112 |
+
Construct an edit tree lemmatizer.
|
113 |
+
|
114 |
+
backoff (Optional[str]): backoff to use when the predicted edit trees
|
115 |
+
are not applicable. Must be an attribute of Token or None (leave the
|
116 |
+
lemma unset).
|
117 |
+
min_tree_freq (int): prune trees that are applied less than this
|
118 |
+
frequency in the training data.
|
119 |
+
overwrite (bool): overwrite existing lemma annotations.
|
120 |
+
top_k (int): try to apply at most the k most probable edit trees.
|
121 |
+
"""
|
122 |
+
self.vocab = vocab
|
123 |
+
self.model = model
|
124 |
+
self.name = name
|
125 |
+
self.backoff = backoff
|
126 |
+
self.min_tree_freq = min_tree_freq
|
127 |
+
self.overwrite = overwrite
|
128 |
+
self.top_k = top_k
|
129 |
+
self.overwrite_labels = overwrite_labels
|
130 |
+
|
131 |
+
self.trees = EditTrees(self.vocab.strings)
|
132 |
+
self.tree2label: Dict[int, int] = {}
|
133 |
+
|
134 |
+
self.cfg: Dict[str, Any] = {"labels": []}
|
135 |
+
self.scorer = scorer
|
136 |
+
self.numpy_ops = NumpyOps()
|
137 |
+
|
138 |
+
def get_loss(
|
139 |
+
self, examples: Iterable[Example], scores: List[Floats2d]
|
140 |
+
) -> Tuple[float, List[Floats2d]]:
|
141 |
+
validate_examples(examples, "EditTreeLemmatizer.get_loss")
|
142 |
+
loss_func = SequenceCategoricalCrossentropy(normalize=False, missing_value=-1)
|
143 |
+
|
144 |
+
truths = []
|
145 |
+
for eg in examples:
|
146 |
+
eg_truths = []
|
147 |
+
for (predicted, gold_lemma, gold_pos, gold_sent_start) in zip(
|
148 |
+
eg.predicted,
|
149 |
+
eg.get_aligned("LEMMA", as_string=True),
|
150 |
+
eg.get_aligned("POS", as_string=True),
|
151 |
+
eg.get_aligned_sent_starts(),
|
152 |
+
):
|
153 |
+
if gold_lemma is None:
|
154 |
+
label = -1
|
155 |
+
else:
|
156 |
+
form = self._get_true_cased_form(
|
157 |
+
predicted.text, gold_sent_start, gold_pos
|
158 |
+
)
|
159 |
+
tree_id = self.trees.add(form, gold_lemma)
|
160 |
+
# debug(f"@get_loss: {predicted}/{gold_pos}[{gold_sent_start}]->{form}|{gold_lemma}[{tree_id}]")
|
161 |
+
label = self.tree2label.get(tree_id, 0)
|
162 |
+
eg_truths.append(label)
|
163 |
+
|
164 |
+
truths.append(eg_truths)
|
165 |
+
|
166 |
+
d_scores, loss = loss_func(scores, truths)
|
167 |
+
if self.model.ops.xp.isnan(loss):
|
168 |
+
raise ValueError(Errors.E910.format(name=self.name))
|
169 |
+
|
170 |
+
return float(loss), d_scores
|
171 |
+
|
172 |
+
def predict(self, docs: Iterable[Doc]) -> List[Ints2d]:
|
173 |
+
if self.top_k == 1:
|
174 |
+
scores2guesses = self._scores2guesses_top_k_equals_1
|
175 |
+
elif self.top_k <= TOP_K_GUARDRAIL:
|
176 |
+
scores2guesses = self._scores2guesses_top_k_greater_1
|
177 |
+
else:
|
178 |
+
scores2guesses = self._scores2guesses_top_k_guardrail
|
179 |
+
# The behaviour of *_scores2guesses_top_k_greater_1()* is efficient for values
|
180 |
+
# of *top_k>1* that are likely to be useful when the edit tree lemmatizer is used
|
181 |
+
# for its principal purpose of lemmatizing tokens. However, the code could also
|
182 |
+
# be used for other purposes, and with very large values of *top_k* the method
|
183 |
+
# becomes inefficient. In such cases, *_scores2guesses_top_k_guardrail()* is used
|
184 |
+
# instead.
|
185 |
+
n_docs = len(list(docs))
|
186 |
+
if not any(len(doc) for doc in docs):
|
187 |
+
# Handle cases where there are no tokens in any docs.
|
188 |
+
n_labels = len(self.cfg["labels"])
|
189 |
+
guesses: List[Ints2d] = [self.model.ops.alloc2i(0, n_labels) for _ in docs]
|
190 |
+
assert len(guesses) == n_docs
|
191 |
+
return guesses
|
192 |
+
scores = self.model.predict(docs)
|
193 |
+
assert len(scores) == n_docs
|
194 |
+
guesses = scores2guesses(docs, scores)
|
195 |
+
assert len(guesses) == n_docs
|
196 |
+
return guesses
|
197 |
+
|
198 |
+
def _scores2guesses_top_k_equals_1(self, docs, scores):
|
199 |
+
guesses = []
|
200 |
+
for doc, doc_scores in zip(docs, scores):
|
201 |
+
doc_guesses = doc_scores.argmax(axis=1)
|
202 |
+
doc_guesses = self.numpy_ops.asarray(doc_guesses)
|
203 |
+
|
204 |
+
doc_compat_guesses = []
|
205 |
+
for i, token in enumerate(doc):
|
206 |
+
tree_id = self.cfg["labels"][doc_guesses[i]]
|
207 |
+
form: str = self._get_true_cased_form_of_token(token)
|
208 |
+
if self.trees.apply(tree_id, form) is not None:
|
209 |
+
doc_compat_guesses.append(tree_id)
|
210 |
+
else:
|
211 |
+
doc_compat_guesses.append(-1)
|
212 |
+
guesses.append(np.array(doc_compat_guesses))
|
213 |
+
|
214 |
+
return guesses
|
215 |
+
|
216 |
+
def _scores2guesses_top_k_greater_1(self, docs, scores):
|
217 |
+
guesses = []
|
218 |
+
top_k = min(self.top_k, len(self.labels))
|
219 |
+
for doc, doc_scores in zip(docs, scores):
|
220 |
+
doc_scores = self.numpy_ops.asarray(doc_scores)
|
221 |
+
doc_compat_guesses = []
|
222 |
+
for i, token in enumerate(doc):
|
223 |
+
for _ in range(top_k):
|
224 |
+
candidate = int(doc_scores[i].argmax())
|
225 |
+
candidate_tree_id = self.cfg["labels"][candidate]
|
226 |
+
form: str = self._get_true_cased_form_of_token(token)
|
227 |
+
if self.trees.apply(candidate_tree_id, form) is not None:
|
228 |
+
doc_compat_guesses.append(candidate_tree_id)
|
229 |
+
break
|
230 |
+
doc_scores[i, candidate] = np.finfo(np.float32).min
|
231 |
+
else:
|
232 |
+
doc_compat_guesses.append(-1)
|
233 |
+
guesses.append(np.array(doc_compat_guesses))
|
234 |
+
|
235 |
+
return guesses
|
236 |
+
|
237 |
+
def _scores2guesses_top_k_guardrail(self, docs, scores):
|
238 |
+
guesses = []
|
239 |
+
for doc, doc_scores in zip(docs, scores):
|
240 |
+
doc_guesses = np.argsort(doc_scores)[..., : -self.top_k - 1 : -1]
|
241 |
+
doc_guesses = self.numpy_ops.asarray(doc_guesses)
|
242 |
+
|
243 |
+
doc_compat_guesses = []
|
244 |
+
for token, candidates in zip(doc, doc_guesses):
|
245 |
+
tree_id = -1
|
246 |
+
for candidate in candidates:
|
247 |
+
candidate_tree_id = self.cfg["labels"][candidate]
|
248 |
+
|
249 |
+
form: str = self._get_true_cased_form_of_token(token)
|
250 |
+
|
251 |
+
if self.trees.apply(candidate_tree_id, form) is not None:
|
252 |
+
tree_id = candidate_tree_id
|
253 |
+
break
|
254 |
+
doc_compat_guesses.append(tree_id)
|
255 |
+
|
256 |
+
guesses.append(np.array(doc_compat_guesses))
|
257 |
+
|
258 |
+
return guesses
|
259 |
+
|
260 |
+
def set_annotations(self, docs: Iterable[Doc], batch_tree_ids):
|
261 |
+
for i, doc in enumerate(docs):
|
262 |
+
doc_tree_ids = batch_tree_ids[i]
|
263 |
+
if hasattr(doc_tree_ids, "get"):
|
264 |
+
doc_tree_ids = doc_tree_ids.get()
|
265 |
+
for j, tree_id in enumerate(doc_tree_ids):
|
266 |
+
if self.overwrite or doc[j].lemma == 0:
|
267 |
+
# If no applicable tree could be found during prediction,
|
268 |
+
# the special identifier -1 is used. Otherwise the tree
|
269 |
+
# is guaranteed to be applicable.
|
270 |
+
if tree_id == -1:
|
271 |
+
if self.backoff is not None:
|
272 |
+
doc[j].lemma = getattr(doc[j], self.backoff)
|
273 |
+
else:
|
274 |
+
form = self._get_true_cased_form_of_token(doc[j])
|
275 |
+
lemma = self.trees.apply(tree_id, form) or form
|
276 |
+
# debug(f"@set_annotations: {doc[j]}/{doc[j].pos_}[{doc[j].is_sent_start}]->{form}|{lemma}[{tree_id}]")
|
277 |
+
doc[j].lemma_ = lemma
|
278 |
+
|
279 |
+
@property
|
280 |
+
def labels(self) -> Tuple[int, ...]:
|
281 |
+
"""Returns the labels currently added to the component."""
|
282 |
+
return tuple(self.cfg["labels"])
|
283 |
+
|
284 |
+
@property
|
285 |
+
def hide_labels(self) -> bool:
|
286 |
+
return True
|
287 |
+
|
288 |
+
@property
|
289 |
+
def label_data(self) -> Dict:
|
290 |
+
trees = []
|
291 |
+
for tree_id in range(len(self.trees)):
|
292 |
+
tree = self.trees[tree_id]
|
293 |
+
if "orig" in tree:
|
294 |
+
tree["orig"] = self.vocab.strings[tree["orig"]]
|
295 |
+
if "subst" in tree:
|
296 |
+
tree["subst"] = self.vocab.strings[tree["subst"]]
|
297 |
+
trees.append(tree)
|
298 |
+
return dict(trees=trees, labels=tuple(self.cfg["labels"]))
|
299 |
+
|
300 |
+
def initialize(
|
301 |
+
self,
|
302 |
+
get_examples: Callable[[], Iterable[Example]],
|
303 |
+
*,
|
304 |
+
nlp: Optional[Language] = None,
|
305 |
+
labels: Optional[Dict] = None,
|
306 |
+
):
|
307 |
+
validate_get_examples(get_examples, "EditTreeLemmatizer.initialize")
|
308 |
+
|
309 |
+
if self.overwrite_labels:
|
310 |
+
if labels is None:
|
311 |
+
self._labels_from_data(get_examples)
|
312 |
+
else:
|
313 |
+
self._add_labels(labels)
|
314 |
+
|
315 |
+
# Sample for the model.
|
316 |
+
doc_sample = []
|
317 |
+
label_sample = []
|
318 |
+
for example in islice(get_examples(), 10):
|
319 |
+
doc_sample.append(example.x)
|
320 |
+
gold_labels: List[List[float]] = []
|
321 |
+
for token in example.reference:
|
322 |
+
if token.lemma == 0:
|
323 |
+
gold_label = None
|
324 |
+
else:
|
325 |
+
gold_label = self._pair2label(token.text, token.lemma_)
|
326 |
+
|
327 |
+
gold_labels.append(
|
328 |
+
[
|
329 |
+
1.0 if label == gold_label else 0.0
|
330 |
+
for label in self.cfg["labels"]
|
331 |
+
]
|
332 |
+
)
|
333 |
+
|
334 |
+
gold_labels = cast(Floats2d, gold_labels)
|
335 |
+
label_sample.append(self.model.ops.asarray(gold_labels, dtype="float32"))
|
336 |
+
|
337 |
+
self._require_labels()
|
338 |
+
assert len(doc_sample) > 0, Errors.E923.format(name=self.name)
|
339 |
+
assert len(label_sample) > 0, Errors.E923.format(name=self.name)
|
340 |
+
|
341 |
+
self.model.initialize(X=doc_sample, Y=label_sample)
|
342 |
+
|
343 |
+
def from_bytes(self, bytes_data, *, exclude=tuple()):
|
344 |
+
deserializers = {
|
345 |
+
"cfg": lambda b: self.cfg.update(srsly.json_loads(b)),
|
346 |
+
"model": lambda b: self.model.from_bytes(b),
|
347 |
+
"vocab": lambda b: self.vocab.from_bytes(b, exclude=exclude),
|
348 |
+
"trees": lambda b: self.trees.from_bytes(b),
|
349 |
+
}
|
350 |
+
|
351 |
+
util.from_bytes(bytes_data, deserializers, exclude)
|
352 |
+
|
353 |
+
return self
|
354 |
+
|
355 |
+
def to_bytes(self, *, exclude=tuple()):
|
356 |
+
serializers = {
|
357 |
+
"cfg": lambda: srsly.json_dumps(self.cfg),
|
358 |
+
"model": lambda: self.model.to_bytes(),
|
359 |
+
"vocab": lambda: self.vocab.to_bytes(exclude=exclude),
|
360 |
+
"trees": lambda: self.trees.to_bytes(),
|
361 |
+
}
|
362 |
+
|
363 |
+
return util.to_bytes(serializers, exclude)
|
364 |
+
|
365 |
+
def to_disk(self, path, exclude=tuple()):
|
366 |
+
path = util.ensure_path(path)
|
367 |
+
serializers = {
|
368 |
+
"cfg": lambda p: srsly.write_json(p, self.cfg),
|
369 |
+
"model": lambda p: self.model.to_disk(p),
|
370 |
+
"vocab": lambda p: self.vocab.to_disk(p, exclude=exclude),
|
371 |
+
"trees": lambda p: self.trees.to_disk(p),
|
372 |
+
}
|
373 |
+
util.to_disk(path, serializers, exclude)
|
374 |
+
|
375 |
+
def from_disk(self, path, exclude=tuple()):
|
376 |
+
def load_model(p):
|
377 |
+
try:
|
378 |
+
with open(p, "rb") as mfile:
|
379 |
+
self.model.from_bytes(mfile.read())
|
380 |
+
except AttributeError:
|
381 |
+
raise ValueError(Errors.E149) from None
|
382 |
+
|
383 |
+
deserializers = {
|
384 |
+
"cfg": lambda p: self.cfg.update(srsly.read_json(p)),
|
385 |
+
"model": load_model,
|
386 |
+
"vocab": lambda p: self.vocab.from_disk(p, exclude=exclude),
|
387 |
+
"trees": lambda p: self.trees.from_disk(p),
|
388 |
+
}
|
389 |
+
|
390 |
+
util.from_disk(path, deserializers, exclude)
|
391 |
+
return self
|
392 |
+
|
393 |
+
def _add_labels(self, labels: Dict):
|
394 |
+
if "labels" not in labels:
|
395 |
+
raise ValueError(Errors.E857.format(name="labels"))
|
396 |
+
if "trees" not in labels:
|
397 |
+
raise ValueError(Errors.E857.format(name="trees"))
|
398 |
+
|
399 |
+
self.cfg["labels"] = list(labels["labels"])
|
400 |
+
trees = []
|
401 |
+
for tree in labels["trees"]:
|
402 |
+
errors = validate_edit_tree(tree)
|
403 |
+
if errors:
|
404 |
+
raise ValueError(Errors.E1026.format(errors="\n".join(errors)))
|
405 |
+
|
406 |
+
tree = dict(tree)
|
407 |
+
if "orig" in tree:
|
408 |
+
tree["orig"] = self.vocab.strings[tree["orig"]]
|
409 |
+
if "orig" in tree:
|
410 |
+
tree["subst"] = self.vocab.strings[tree["subst"]]
|
411 |
+
|
412 |
+
trees.append(tree)
|
413 |
+
|
414 |
+
self.trees.from_json(trees)
|
415 |
+
|
416 |
+
for label, tree in enumerate(self.labels):
|
417 |
+
self.tree2label[tree] = label
|
418 |
+
|
419 |
+
def _labels_from_data(self, get_examples: Callable[[], Iterable[Example]]):
|
420 |
+
# Count corpus tree frequencies in ad-hoc storage to avoid cluttering
|
421 |
+
# the final pipe/string store.
|
422 |
+
vocab = Vocab()
|
423 |
+
trees = EditTrees(vocab.strings)
|
424 |
+
tree_freqs: Counter = Counter()
|
425 |
+
repr_pairs: Dict = {}
|
426 |
+
for example in get_examples():
|
427 |
+
for token in example.reference:
|
428 |
+
if token.lemma != 0:
|
429 |
+
form = self._get_true_cased_form_of_token(token)
|
430 |
+
# debug("_labels_from_data", str(token) + "->" + form, token.lemma_)
|
431 |
+
tree_id = trees.add(form, token.lemma_)
|
432 |
+
tree_freqs[tree_id] += 1
|
433 |
+
repr_pairs[tree_id] = (form, token.lemma_)
|
434 |
+
|
435 |
+
# Construct trees that make the frequency cut-off using representative
|
436 |
+
# form - token pairs.
|
437 |
+
for tree_id, freq in tree_freqs.items():
|
438 |
+
if freq >= self.min_tree_freq:
|
439 |
+
form, lemma = repr_pairs[tree_id]
|
440 |
+
self._pair2label(form, lemma, add_label=True)
|
441 |
+
|
442 |
+
@lru_cache()
|
443 |
+
def _get_true_cased_form(self, token: str, is_sent_start: bool, pos: str) -> str:
|
444 |
+
if is_sent_start and pos != "PROPN":
|
445 |
+
return token.lower()
|
446 |
+
else:
|
447 |
+
return token
|
448 |
+
|
449 |
+
def _get_true_cased_form_of_token(self, token: Token) -> str:
|
450 |
+
return self._get_true_cased_form(token.text, token.is_sent_start, token.pos_)
|
451 |
+
|
452 |
+
def _pair2label(self, form, lemma, add_label=False):
|
453 |
+
"""
|
454 |
+
Look up the edit tree identifier for a form/label pair. If the edit
|
455 |
+
tree is unknown and "add_label" is set, the edit tree will be added to
|
456 |
+
the labels.
|
457 |
+
"""
|
458 |
+
tree_id = self.trees.add(form, lemma)
|
459 |
+
if tree_id not in self.tree2label:
|
460 |
+
if not add_label:
|
461 |
+
return None
|
462 |
+
|
463 |
+
self.tree2label[tree_id] = len(self.cfg["labels"])
|
464 |
+
self.cfg["labels"].append(tree_id)
|
465 |
+
return self.tree2label[tree_id]
|
experimental_arc_labeler/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:013bece7bd7f81ed854dac90e4dd2808b225e99791e41e8539dc73ccf809dbc7
|
3 |
+
size 447476740
|
experimental_arc_predicter/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1943454bb433518bc8782e8bb998e525031765e25445daf654a26d5b255576a6
|
3 |
+
size 445185700
|
hu_core_news_trf-any-py3-none-any.whl
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:27f58502147dd689d28a6eb03bf6fe7654246e5469a16071c8260e11fca265ff
|
3 |
+
size 1668889766
|
lemma_postprocessing.py
ADDED
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
This module contains various rule-based components aiming to improve on baseline lemmatization tools.
|
3 |
+
"""
|
4 |
+
|
5 |
+
import re
|
6 |
+
from typing import List, Callable
|
7 |
+
|
8 |
+
from spacy.lang.hu import Hungarian
|
9 |
+
from spacy.pipeline import Pipe
|
10 |
+
from spacy.tokens import Token
|
11 |
+
from spacy.tokens.doc import Doc
|
12 |
+
|
13 |
+
|
14 |
+
@Hungarian.component(
|
15 |
+
"lemma_case_smoother",
|
16 |
+
assigns=["token.lemma"],
|
17 |
+
requires=["token.lemma", "token.pos"],
|
18 |
+
)
|
19 |
+
def lemma_case_smoother(doc: Doc) -> Doc:
|
20 |
+
"""Smooth lemma casing by POS.
|
21 |
+
|
22 |
+
DEPRECATED: This is not needed anymore, as the lemmatizer is now case-insensitive.
|
23 |
+
|
24 |
+
Args:
|
25 |
+
doc (Doc): Input document.
|
26 |
+
|
27 |
+
Returns:
|
28 |
+
Doc: Output document.
|
29 |
+
"""
|
30 |
+
for token in doc:
|
31 |
+
if token.is_sent_start and token.tag_ != "PROPN":
|
32 |
+
token.lemma_ = token.lemma_.lower()
|
33 |
+
|
34 |
+
return doc
|
35 |
+
|
36 |
+
|
37 |
+
class LemmaSmoother(Pipe):
|
38 |
+
"""Smooths lemma by fixing common errors of the edit-tree lemmatizer."""
|
39 |
+
|
40 |
+
_DATE_PATTERN = re.compile(r"(\d+)-j?[éá]?n?a?(t[őó]l)?")
|
41 |
+
_NUMBER_PATTERN = re.compile(r"(\d+([-,/_.:]?(._)?\d+)*%?)")
|
42 |
+
|
43 |
+
# noinspection PyUnusedLocal
|
44 |
+
@staticmethod
|
45 |
+
@Hungarian.factory("lemma_smoother", assigns=["token.lemma"], requires=["token.lemma", "token.pos"])
|
46 |
+
def create_lemma_smoother(nlp: Hungarian, name: str) -> "LemmaSmoother":
|
47 |
+
return LemmaSmoother()
|
48 |
+
|
49 |
+
def __call__(self, doc: Doc) -> Doc:
|
50 |
+
rules: List[Callable] = [
|
51 |
+
self._remove_exclamation_marks,
|
52 |
+
self._remove_question_marks,
|
53 |
+
self._remove_date_suffixes,
|
54 |
+
self._remove_suffix_after_numbers,
|
55 |
+
]
|
56 |
+
|
57 |
+
for token in doc:
|
58 |
+
for rule in rules:
|
59 |
+
rule(token)
|
60 |
+
|
61 |
+
return doc
|
62 |
+
|
63 |
+
@classmethod
|
64 |
+
def _remove_exclamation_marks(cls, token: Token) -> None:
|
65 |
+
"""Removes exclamation marks from the lemma.
|
66 |
+
|
67 |
+
Args:
|
68 |
+
token (Token): The original token.
|
69 |
+
"""
|
70 |
+
|
71 |
+
if "!" != token.lemma_:
|
72 |
+
exclamation_mark_index = token.lemma_.find("!")
|
73 |
+
if exclamation_mark_index != -1:
|
74 |
+
token.lemma_ = token.lemma_[:exclamation_mark_index]
|
75 |
+
|
76 |
+
@classmethod
|
77 |
+
def _remove_question_marks(cls, token: Token) -> None:
|
78 |
+
"""Removes question marks from the lemma.
|
79 |
+
|
80 |
+
Args:
|
81 |
+
token (Token): The original token.
|
82 |
+
"""
|
83 |
+
|
84 |
+
if "?" != token.lemma_:
|
85 |
+
question_mark_index = token.lemma_.find("?")
|
86 |
+
if question_mark_index != -1:
|
87 |
+
token.lemma_ = token.lemma_[:question_mark_index]
|
88 |
+
|
89 |
+
@classmethod
|
90 |
+
def _remove_date_suffixes(cls, token: Token) -> None:
|
91 |
+
"""Fixes the suffixes of dates.
|
92 |
+
|
93 |
+
Args:
|
94 |
+
token (Token): The original token.
|
95 |
+
"""
|
96 |
+
|
97 |
+
if token.pos_ == "NOUN":
|
98 |
+
match = cls._DATE_PATTERN.match(token.lemma_)
|
99 |
+
if match is not None:
|
100 |
+
token.lemma_ = match.group(1) + "."
|
101 |
+
|
102 |
+
@classmethod
|
103 |
+
def _remove_suffix_after_numbers(cls, token: Token) -> None:
|
104 |
+
"""Removes suffixes after numbers.
|
105 |
+
|
106 |
+
Args:
|
107 |
+
token (str): The original token.
|
108 |
+
"""
|
109 |
+
|
110 |
+
if token.pos_ == "NUM":
|
111 |
+
match = cls._NUMBER_PATTERN.match(token.text)
|
112 |
+
if match is not None:
|
113 |
+
token.lemma_ = match.group(0)
|
lookup_lemmatizer.py
ADDED
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import re
|
2 |
+
from collections import defaultdict
|
3 |
+
from operator import itemgetter
|
4 |
+
from pathlib import Path
|
5 |
+
from re import Pattern
|
6 |
+
from typing import Optional, Callable, Iterable, Dict, Tuple
|
7 |
+
|
8 |
+
from spacy.lang.hu import Hungarian
|
9 |
+
from spacy.language import Language
|
10 |
+
from spacy.lookups import Lookups, Table
|
11 |
+
from spacy.pipeline import Pipe
|
12 |
+
from spacy.pipeline.lemmatizer import lemmatizer_score
|
13 |
+
from spacy.tokens import Token
|
14 |
+
from spacy.tokens.doc import Doc
|
15 |
+
|
16 |
+
# noinspection PyUnresolvedReferences
|
17 |
+
from spacy.training.example import Example
|
18 |
+
from spacy.util import ensure_path
|
19 |
+
|
20 |
+
|
21 |
+
class LookupLemmatizer(Pipe):
|
22 |
+
"""
|
23 |
+
LookupLemmatizer learn `(token, pos, morph. feat) -> lemma` mappings during training, and applies them at prediction
|
24 |
+
time.
|
25 |
+
"""
|
26 |
+
|
27 |
+
_number_pattern: Pattern = re.compile(r"\d")
|
28 |
+
|
29 |
+
# noinspection PyUnusedLocal
|
30 |
+
@staticmethod
|
31 |
+
@Hungarian.factory(
|
32 |
+
"lookup_lemmatizer",
|
33 |
+
assigns=["token.lemma"],
|
34 |
+
requires=["token.pos"],
|
35 |
+
default_config={"scorer": {"@scorers": "spacy.lemmatizer_scorer.v1"}, "source": ""},
|
36 |
+
)
|
37 |
+
def create(nlp: Language, name: str, scorer: Optional[Callable], source: str) -> "LookupLemmatizer":
|
38 |
+
return LookupLemmatizer(None, source, scorer)
|
39 |
+
|
40 |
+
def train(self, sentences: Iterable[Iterable[Tuple[str, str, str, str]]], min_occurrences: int = 1) -> None:
|
41 |
+
"""
|
42 |
+
|
43 |
+
Args:
|
44 |
+
sentences (Iterable[Iterable[Tuple[str, str, str, str]]]): Sentences to learn the mappings from
|
45 |
+
min_occurrences (int): mapping occurring less than this threshold are not learned
|
46 |
+
|
47 |
+
"""
|
48 |
+
|
49 |
+
# Lookup table which maps (upos, form) to (lemma -> frequency),
|
50 |
+
# e.g. `{ ("NOUN", "alma"): { "alma" : 99, "alom": 1} }`
|
51 |
+
lemma_lookup_table: Dict[Tuple[str, str], Dict[str, int]] = defaultdict(lambda: defaultdict(int))
|
52 |
+
|
53 |
+
for sentence in sentences:
|
54 |
+
for token, pos, feats, lemma in sentence:
|
55 |
+
token = self.__mask_numbers(token)
|
56 |
+
lemma = self.__mask_numbers(lemma)
|
57 |
+
feats_str = ("|" + feats) if feats else ""
|
58 |
+
key = (token, pos + feats_str)
|
59 |
+
lemma_lookup_table[key][lemma] += 1
|
60 |
+
lemma_lookup_table = dict(lemma_lookup_table)
|
61 |
+
|
62 |
+
self._lookups = Lookups()
|
63 |
+
table = Table(name="lemma_lookups")
|
64 |
+
|
65 |
+
lemma_freq: Dict[str, int]
|
66 |
+
for (form, pos), lemma_freq in dict(lemma_lookup_table).items():
|
67 |
+
most_freq_lemma, freq = sorted(lemma_freq.items(), key=itemgetter(1), reverse=True)[0]
|
68 |
+
if freq >= min_occurrences:
|
69 |
+
if form not in table:
|
70 |
+
# lemma by pos
|
71 |
+
table[form]: Dict[str, str] = dict()
|
72 |
+
table[form][pos] = most_freq_lemma
|
73 |
+
|
74 |
+
self._lookups.set_table(name=f"lemma_lookups", table=table)
|
75 |
+
|
76 |
+
def __init__(
|
77 |
+
self,
|
78 |
+
lookups: Optional[Lookups] = None,
|
79 |
+
source: Optional[str] = None,
|
80 |
+
scorer: Optional[Callable] = lemmatizer_score,
|
81 |
+
):
|
82 |
+
self._lookups: Optional[Lookups] = lookups
|
83 |
+
self.scorer = scorer
|
84 |
+
self.source = source
|
85 |
+
|
86 |
+
def __call__(self, doc: Doc) -> Doc:
|
87 |
+
assert self._lookups is not None, "Lookup table should be initialized first"
|
88 |
+
|
89 |
+
token: Token
|
90 |
+
for token in doc:
|
91 |
+
lemma_lookup_table = self._lookups.get_table(f"lemma_lookups")
|
92 |
+
masked_token = self.__mask_numbers(token.text)
|
93 |
+
|
94 |
+
if masked_token in lemma_lookup_table:
|
95 |
+
lemma_by_pos: Dict[str, str] = lemma_lookup_table[masked_token]
|
96 |
+
feats_str = ("|" + str(token.morph)) if str(token.morph) else ""
|
97 |
+
key = token.pos_ + feats_str
|
98 |
+
if key in lemma_by_pos:
|
99 |
+
if masked_token != token.text:
|
100 |
+
# If the token contains numbers, we need to replace the numbers in the lemma as well
|
101 |
+
token.lemma_ = self.__replace_numbers(lemma_by_pos[key], token.text)
|
102 |
+
pass
|
103 |
+
else:
|
104 |
+
token.lemma_ = lemma_by_pos[key]
|
105 |
+
return doc
|
106 |
+
|
107 |
+
# noinspection PyUnusedLocal
|
108 |
+
def to_disk(self, path, exclude=tuple()):
|
109 |
+
assert self._lookups is not None, "Lookup table should be initialized first"
|
110 |
+
|
111 |
+
path: Path = ensure_path(path)
|
112 |
+
path.mkdir(exist_ok=True)
|
113 |
+
self._lookups.to_disk(path)
|
114 |
+
|
115 |
+
# noinspection PyUnusedLocal
|
116 |
+
def from_disk(self, path, exclude=tuple()) -> "LookupLemmatizer":
|
117 |
+
path: Path = ensure_path(path)
|
118 |
+
lookups = Lookups()
|
119 |
+
self._lookups = lookups.from_disk(path=path)
|
120 |
+
return self
|
121 |
+
|
122 |
+
def initialize(self, get_examples: Callable[[], Iterable[Example]], *, nlp: Language = None) -> None:
|
123 |
+
lookups = Lookups()
|
124 |
+
self._lookups = lookups.from_disk(path=self.source)
|
125 |
+
|
126 |
+
@classmethod
|
127 |
+
def __mask_numbers(cls, token: str) -> str:
|
128 |
+
return cls._number_pattern.sub("0", token)
|
129 |
+
|
130 |
+
@classmethod
|
131 |
+
def __replace_numbers(cls, lemma: str, token: str) -> str:
|
132 |
+
return cls._number_pattern.sub(lambda match: token[match.start()], lemma)
|
meta.json
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
{
|
2 |
"lang":"hu",
|
3 |
"name":"core_news_trf",
|
4 |
-
"version":"3.
|
5 |
-
"description":"Hungarian transformer pipeline (
|
6 |
"author":"SzegedAI, MILAB",
|
7 |
"email":"[email protected]",
|
8 |
"url":"https://github.com/huspacy/huspacy",
|
9 |
"license":"cc-by-sa-4.0",
|
10 |
-
"spacy_version":">=3.
|
11 |
"spacy_git_version":"Unknown",
|
12 |
"vectors":{
|
13 |
"width":0,
|
@@ -1282,44 +1282,44 @@
|
|
1282 |
|
1283 |
],
|
1284 |
"performance":{
|
1285 |
-
"token_acc":0.
|
1286 |
"token_p":0.998565417,
|
1287 |
"token_r":0.9993300153,
|
1288 |
"token_f":0.9989475698,
|
1289 |
-
"sents_p":0.
|
1290 |
-
"sents_r":0.
|
1291 |
-
"sents_f":0.
|
1292 |
-
"tag_acc":0.
|
1293 |
-
"pos_acc":0.
|
1294 |
-
"morph_acc":0.
|
1295 |
-
"morph_micro_p":0.
|
1296 |
-
"morph_micro_r":0.
|
1297 |
-
"morph_micro_f":0.
|
1298 |
"morph_per_feat":{
|
1299 |
"Definite":{
|
1300 |
-
"p":0.
|
1301 |
-
"r":0.
|
1302 |
-
"f":0.
|
1303 |
},
|
1304 |
"PronType":{
|
1305 |
-
"p":0.
|
1306 |
-
"r":0.
|
1307 |
-
"f":0.
|
1308 |
},
|
1309 |
"Case":{
|
1310 |
-
"p":0.
|
1311 |
-
"r":0.
|
1312 |
-
"f":0.
|
1313 |
},
|
1314 |
"Degree":{
|
1315 |
-
"p":0.
|
1316 |
-
"r":0.
|
1317 |
-
"f":0.
|
1318 |
},
|
1319 |
"Number":{
|
1320 |
-
"p":0.
|
1321 |
-
"r":0.
|
1322 |
-
"f":0.
|
1323 |
},
|
1324 |
"Mood":{
|
1325 |
"p":0.9446290144,
|
@@ -1327,44 +1327,44 @@
|
|
1327 |
"f":0.9451523546
|
1328 |
},
|
1329 |
"Person":{
|
1330 |
-
"p":0.
|
1331 |
"r":0.9942434211,
|
1332 |
-
"f":0.
|
1333 |
},
|
1334 |
"Tense":{
|
1335 |
-
"p":0.
|
1336 |
-
"r":0.
|
1337 |
-
"f":0.
|
1338 |
},
|
1339 |
"VerbForm":{
|
1340 |
-
"p":0.
|
1341 |
-
"r":0.
|
1342 |
-
"f":0.
|
1343 |
},
|
1344 |
"Voice":{
|
1345 |
-
"p":0.
|
1346 |
-
"r":0.
|
1347 |
-
"f":0.
|
1348 |
},
|
1349 |
"Number[psor]":{
|
1350 |
-
"p":0.
|
1351 |
-
"r":0.
|
1352 |
-
"f":0.
|
1353 |
},
|
1354 |
"Person[psor]":{
|
1355 |
-
"p":0.
|
1356 |
-
"r":0.
|
1357 |
-
"f":0.
|
1358 |
},
|
1359 |
"NumType":{
|
1360 |
-
"p":0.
|
1361 |
-
"r":0.
|
1362 |
-
"f":0.
|
1363 |
},
|
1364 |
"Poss":{
|
1365 |
-
"p":0.
|
1366 |
"r":1.0,
|
1367 |
-
"f":0.
|
1368 |
},
|
1369 |
"Reflex":{
|
1370 |
"p":0.0,
|
@@ -1388,195 +1388,190 @@
|
|
1388 |
},
|
1389 |
"Number[psed]":{
|
1390 |
"p":1.0,
|
1391 |
-
"r":0.
|
1392 |
-
"f":0.
|
1393 |
}
|
1394 |
},
|
1395 |
-
"lemma_acc":0.
|
1396 |
-
"bound_dep_las":0.
|
1397 |
-
"bound_dep_uas":0.
|
1398 |
-
"dep_uas":0.
|
1399 |
-
"dep_las":0.
|
1400 |
"dep_las_per_type":{
|
1401 |
"415":{
|
1402 |
-
"p":0.
|
1403 |
-
"r":0.
|
1404 |
-
"f":0.
|
1405 |
},
|
1406 |
"7411097074813287689":{
|
1407 |
-
"p":0.
|
1408 |
-
"r":0.
|
1409 |
-
"f":0.
|
1410 |
},
|
1411 |
"429":{
|
1412 |
-
"p":0.
|
1413 |
-
"r":0.
|
1414 |
-
"f":0.
|
1415 |
},
|
1416 |
"15861261214731031920":{
|
1417 |
-
"p":0.
|
1418 |
-
"r":0.
|
1419 |
-
"f":0.
|
1420 |
},
|
1421 |
"991268021520064439":{
|
1422 |
-
"p":0.
|
1423 |
-
"r":0.
|
1424 |
-
"f":0.
|
1425 |
},
|
1426 |
"435":{
|
1427 |
-
"p":0.
|
1428 |
-
"r":0.
|
1429 |
-
"f":0.
|
1430 |
},
|
1431 |
"434":{
|
1432 |
-
"p":0.
|
1433 |
-
"r":0.
|
1434 |
-
"f":0.
|
1435 |
},
|
1436 |
"8206900633647566924":{
|
1437 |
-
"p":0.
|
1438 |
-
"r":0.
|
1439 |
-
"f":0.
|
1440 |
},
|
1441 |
"407":{
|
1442 |
-
"p":0.
|
1443 |
"r":0.8378947368,
|
1444 |
-
"f":0.
|
1445 |
},
|
1446 |
"410":{
|
1447 |
-
"p":0.
|
1448 |
-
"r":0.
|
1449 |
-
"f":0.
|
1450 |
},
|
1451 |
"445":{
|
1452 |
-
"p":0.
|
1453 |
-
"r":0.
|
1454 |
-
"f":0.
|
1455 |
},
|
1456 |
"400":{
|
1457 |
-
"p":0.
|
1458 |
-
"r":0.
|
1459 |
-
"f":0.
|
1460 |
},
|
1461 |
"17772752594865228322":{
|
1462 |
-
"p":0.
|
1463 |
-
"r":0.
|
1464 |
-
"f":0.
|
1465 |
},
|
1466 |
"403":{
|
1467 |
-
"p":0.
|
1468 |
-
"r":0.
|
1469 |
-
"f":0.
|
1470 |
},
|
1471 |
"399":{
|
1472 |
-
"p":0.
|
1473 |
-
"r":0.
|
1474 |
-
"f":0.
|
1475 |
},
|
1476 |
"3143985677199705895":{
|
1477 |
-
"p":0.
|
1478 |
-
"r":0.
|
1479 |
-
"f":0.
|
1480 |
},
|
1481 |
"9241468201421778905":{
|
1482 |
-
"p":0.
|
1483 |
-
"r":0.
|
1484 |
-
"f":0.
|
1485 |
},
|
1486 |
"423":{
|
1487 |
-
"p":0.
|
1488 |
-
"r":0.
|
1489 |
-
"f":0.
|
1490 |
},
|
1491 |
"13543738850102096385":{
|
1492 |
-
"p":0.
|
1493 |
-
"r":0.
|
1494 |
-
"f":0.
|
1495 |
},
|
1496 |
"10901028881100056900":{
|
1497 |
-
"p":0.
|
1498 |
-
"r":0.
|
1499 |
-
"f":0.
|
1500 |
},
|
1501 |
"411":{
|
1502 |
-
"p":0.
|
1503 |
-
"r":0.
|
1504 |
-
"f":0.
|
1505 |
},
|
1506 |
"12549387360942434255":{
|
1507 |
-
"p":0.
|
1508 |
-
"r":0.
|
1509 |
-
"f":0.
|
1510 |
},
|
1511 |
"303601073839818384":{
|
1512 |
"p":0.5,
|
1513 |
-
"r":0.
|
1514 |
-
"f":0.
|
1515 |
},
|
1516 |
"8884235091647096537":{
|
1517 |
-
"p":0.
|
1518 |
-
"r":0.
|
1519 |
-
"f":0.
|
1520 |
},
|
1521 |
"2249809950233855422":{
|
1522 |
-
"p":0.
|
1523 |
-
"r":0.
|
1524 |
-
"f":0.
|
1525 |
},
|
1526 |
"422":{
|
1527 |
-
"p":0.
|
1528 |
-
"r":0.
|
1529 |
-
"f":0.
|
1530 |
},
|
1531 |
"8110129090154140942":{
|
1532 |
-
"p":0.
|
1533 |
-
"r":0.
|
1534 |
-
"f":0.
|
1535 |
},
|
1536 |
"412":{
|
1537 |
-
"p":0.
|
1538 |
-
"r":0.
|
1539 |
-
"f":0.
|
1540 |
},
|
1541 |
"436":{
|
1542 |
-
"p":0.
|
1543 |
-
"r":0.
|
1544 |
-
"f":0.
|
1545 |
},
|
1546 |
"450":{
|
1547 |
-
"p":0.
|
1548 |
"r":0.9594594595,
|
1549 |
-
"f":0.
|
1550 |
},
|
1551 |
"12837356684637874264":{
|
1552 |
-
"p":0.
|
1553 |
-
"r":0.
|
1554 |
-
"f":0.
|
1555 |
-
},
|
1556 |
-
"408":{
|
1557 |
-
"p":0.1111111111,
|
1558 |
-
"r":0.0769230769,
|
1559 |
-
"f":0.0909090909
|
1560 |
},
|
1561 |
"451":{
|
1562 |
-
"p":0.
|
1563 |
-
"r":0.
|
1564 |
-
"f":0.
|
1565 |
},
|
1566 |
"7349492218059511525":{
|
1567 |
-
"p":0.
|
1568 |
-
"r":0.
|
1569 |
-
"f":0.
|
1570 |
},
|
1571 |
"426":{
|
1572 |
-
"p":0.
|
1573 |
"r":0.4545454545,
|
1574 |
-
"f":0.
|
1575 |
},
|
1576 |
"405":{
|
1577 |
-
"p":0.
|
1578 |
-
"r":0.
|
1579 |
-
"f":0.
|
1580 |
},
|
1581 |
"17865338459503383721":{
|
1582 |
"p":1.0,
|
@@ -1593,16 +1588,21 @@
|
|
1593 |
"r":0.975,
|
1594 |
"f":0.975
|
1595 |
},
|
1596 |
-
"
|
1597 |
"p":0.0,
|
1598 |
"r":0.0,
|
1599 |
"f":0.0
|
1600 |
},
|
1601 |
-
"
|
1602 |
"p":0.0,
|
1603 |
"r":0.0,
|
1604 |
"f":0.0
|
1605 |
},
|
|
|
|
|
|
|
|
|
|
|
1606 |
"10069665988847657778":{
|
1607 |
"p":0.0,
|
1608 |
"r":0.0,
|
@@ -1613,43 +1613,43 @@
|
|
1613 |
"r":0.1666666667,
|
1614 |
"f":0.2857142857
|
1615 |
},
|
|
|
|
|
|
|
|
|
|
|
1616 |
"203073658115086772":{
|
1617 |
"p":0.0,
|
1618 |
"r":0.0,
|
1619 |
"f":0.0
|
1620 |
-
},
|
1621 |
-
"6522094215780122214":{
|
1622 |
-
"p":1.0,
|
1623 |
-
"r":1.0,
|
1624 |
-
"f":1.0
|
1625 |
}
|
1626 |
},
|
1627 |
-
"ents_p":0.
|
1628 |
-
"ents_r":0.
|
1629 |
-
"ents_f":0.
|
1630 |
"ents_per_type":{
|
1631 |
"ORG":{
|
1632 |
-
"p":0.
|
1633 |
-
"r":0.
|
1634 |
-
"f":0.
|
1635 |
},
|
1636 |
"PER":{
|
1637 |
-
"p":0.
|
1638 |
-
"r":0.
|
1639 |
-
"f":0.
|
1640 |
},
|
1641 |
"LOC":{
|
1642 |
-
"p":0.
|
1643 |
-
"r":0.
|
1644 |
-
"f":0.
|
1645 |
},
|
1646 |
"MISC":{
|
1647 |
-
"p":0.
|
1648 |
-
"r":0.
|
1649 |
-
"f":0.
|
1650 |
}
|
1651 |
},
|
1652 |
-
"speed":
|
1653 |
},
|
1654 |
"sources":[
|
1655 |
{
|
|
|
1 |
{
|
2 |
"lang":"hu",
|
3 |
"name":"core_news_trf",
|
4 |
+
"version":"3.5.0",
|
5 |
+
"description":"Hungarian transformer pipeline (huBERT) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner",
|
6 |
"author":"SzegedAI, MILAB",
|
7 |
"email":"[email protected]",
|
8 |
"url":"https://github.com/huspacy/huspacy",
|
9 |
"license":"cc-by-sa-4.0",
|
10 |
+
"spacy_version":">=3.5.0,<3.6.0",
|
11 |
"spacy_git_version":"Unknown",
|
12 |
"vectors":{
|
13 |
"width":0,
|
|
|
1282 |
|
1283 |
],
|
1284 |
"performance":{
|
1285 |
+
"token_acc":0.9999043611,
|
1286 |
"token_p":0.998565417,
|
1287 |
"token_r":0.9993300153,
|
1288 |
"token_f":0.9989475698,
|
1289 |
+
"sents_p":0.9713656388,
|
1290 |
+
"sents_r":0.9821826281,
|
1291 |
+
"sents_f":0.976744186,
|
1292 |
+
"tag_acc":0.9746877841,
|
1293 |
+
"pos_acc":0.974400689,
|
1294 |
+
"morph_acc":0.9452579194,
|
1295 |
+
"morph_micro_p":0.9804550379,
|
1296 |
+
"morph_micro_r":0.9722389343,
|
1297 |
+
"morph_micro_f":0.9763297012,
|
1298 |
"morph_per_feat":{
|
1299 |
"Definite":{
|
1300 |
+
"p":0.9952584163,
|
1301 |
+
"r":0.9794680355,
|
1302 |
+
"f":0.9873000941
|
1303 |
},
|
1304 |
"PronType":{
|
1305 |
+
"p":0.9706696182,
|
1306 |
+
"r":0.96799117,
|
1307 |
+
"f":0.9693285438
|
1308 |
},
|
1309 |
"Case":{
|
1310 |
+
"p":0.9902739182,
|
1311 |
+
"r":0.9857735625,
|
1312 |
+
"f":0.9880186157
|
1313 |
},
|
1314 |
"Degree":{
|
1315 |
+
"p":0.8964968153,
|
1316 |
+
"r":0.9367720466,
|
1317 |
+
"f":0.916192026
|
1318 |
},
|
1319 |
"Number":{
|
1320 |
+
"p":0.9931380753,
|
1321 |
+
"r":0.9944695827,
|
1322 |
+
"f":0.993803383
|
1323 |
},
|
1324 |
"Mood":{
|
1325 |
"p":0.9446290144,
|
|
|
1327 |
"f":0.9451523546
|
1328 |
},
|
1329 |
"Person":{
|
1330 |
+
"p":0.96875,
|
1331 |
"r":0.9942434211,
|
1332 |
+
"f":0.9813311688
|
1333 |
},
|
1334 |
"Tense":{
|
1335 |
+
"p":0.9955703212,
|
1336 |
+
"r":0.9933701657,
|
1337 |
+
"f":0.9944690265
|
1338 |
},
|
1339 |
"VerbForm":{
|
1340 |
+
"p":0.9959349593,
|
1341 |
+
"r":0.7858861267,
|
1342 |
+
"f":0.8785298073
|
1343 |
},
|
1344 |
"Voice":{
|
1345 |
+
"p":0.9846782431,
|
1346 |
+
"r":0.9856850716,
|
1347 |
+
"f":0.9851814001
|
1348 |
},
|
1349 |
"Number[psor]":{
|
1350 |
+
"p":0.9957386364,
|
1351 |
+
"r":0.9985754986,
|
1352 |
+
"f":0.9971550498
|
1353 |
},
|
1354 |
"Person[psor]":{
|
1355 |
+
"p":0.9943181818,
|
1356 |
+
"r":0.9985734665,
|
1357 |
+
"f":0.9964412811
|
1358 |
},
|
1359 |
"NumType":{
|
1360 |
+
"p":0.9575471698,
|
1361 |
+
"r":0.9902439024,
|
1362 |
+
"f":0.9736211031
|
1363 |
},
|
1364 |
"Poss":{
|
1365 |
+
"p":0.5,
|
1366 |
"r":1.0,
|
1367 |
+
"f":0.6666666667
|
1368 |
},
|
1369 |
"Reflex":{
|
1370 |
"p":0.0,
|
|
|
1388 |
},
|
1389 |
"Number[psed]":{
|
1390 |
"p":1.0,
|
1391 |
+
"r":0.7777777778,
|
1392 |
+
"f":0.875
|
1393 |
}
|
1394 |
},
|
1395 |
+
"lemma_acc":0.9874653143,
|
1396 |
+
"bound_dep_las":0.8686201283,
|
1397 |
+
"bound_dep_uas":0.9097960356,
|
1398 |
+
"dep_uas":0.9092736147,
|
1399 |
+
"dep_las":0.8681339713,
|
1400 |
"dep_las_per_type":{
|
1401 |
"415":{
|
1402 |
+
"p":0.9512779553,
|
1403 |
+
"r":0.9482484076,
|
1404 |
+
"f":0.9497607656
|
1405 |
},
|
1406 |
"7411097074813287689":{
|
1407 |
+
"p":0.9126290707,
|
1408 |
+
"r":0.9394930499,
|
1409 |
+
"f":0.9258662369
|
1410 |
},
|
1411 |
"429":{
|
1412 |
+
"p":0.9369951535,
|
1413 |
+
"r":0.90625,
|
1414 |
+
"f":0.9213661636
|
1415 |
},
|
1416 |
"15861261214731031920":{
|
1417 |
+
"p":0.7480719794,
|
1418 |
+
"r":0.7132352941,
|
1419 |
+
"f":0.730238394
|
1420 |
},
|
1421 |
"991268021520064439":{
|
1422 |
+
"p":0.8733333333,
|
1423 |
+
"r":0.8881355932,
|
1424 |
+
"f":0.8806722689
|
1425 |
},
|
1426 |
"435":{
|
1427 |
+
"p":0.8789473684,
|
1428 |
+
"r":0.901890189,
|
1429 |
+
"f":0.8902709907
|
1430 |
},
|
1431 |
"434":{
|
1432 |
+
"p":0.9434782609,
|
1433 |
+
"r":0.9752808989,
|
1434 |
+
"f":0.9591160221
|
1435 |
},
|
1436 |
"8206900633647566924":{
|
1437 |
+
"p":0.8568588469,
|
1438 |
+
"r":0.9599109131,
|
1439 |
+
"f":0.9054621849
|
1440 |
},
|
1441 |
"407":{
|
1442 |
+
"p":0.8361344538,
|
1443 |
"r":0.8378947368,
|
1444 |
+
"f":0.8370136698
|
1445 |
},
|
1446 |
"410":{
|
1447 |
+
"p":0.7408163265,
|
1448 |
+
"r":0.75625,
|
1449 |
+
"f":0.7484536082
|
1450 |
},
|
1451 |
"445":{
|
1452 |
+
"p":0.8628649016,
|
1453 |
+
"r":0.8593644354,
|
1454 |
+
"f":0.8611111111
|
1455 |
},
|
1456 |
"400":{
|
1457 |
+
"p":0.8383838384,
|
1458 |
+
"r":0.8736842105,
|
1459 |
+
"f":0.8556701031
|
1460 |
},
|
1461 |
"17772752594865228322":{
|
1462 |
+
"p":0.9398148148,
|
1463 |
+
"r":0.9485981308,
|
1464 |
+
"f":0.9441860465
|
1465 |
},
|
1466 |
"403":{
|
1467 |
+
"p":0.7323943662,
|
1468 |
+
"r":0.5531914894,
|
1469 |
+
"f":0.6303030303
|
1470 |
},
|
1471 |
"399":{
|
1472 |
+
"p":0.6037735849,
|
1473 |
+
"r":0.6530612245,
|
1474 |
+
"f":0.6274509804
|
1475 |
},
|
1476 |
"3143985677199705895":{
|
1477 |
+
"p":0.8073770492,
|
1478 |
+
"r":0.8565217391,
|
1479 |
+
"f":0.8312236287
|
1480 |
},
|
1481 |
"9241468201421778905":{
|
1482 |
+
"p":0.4571428571,
|
1483 |
+
"r":0.4848484848,
|
1484 |
+
"f":0.4705882353
|
1485 |
},
|
1486 |
"423":{
|
1487 |
+
"p":0.9371069182,
|
1488 |
+
"r":0.9430379747,
|
1489 |
+
"f":0.9400630915
|
1490 |
},
|
1491 |
"13543738850102096385":{
|
1492 |
+
"p":0.9814814815,
|
1493 |
+
"r":0.9724770642,
|
1494 |
+
"f":0.9769585253
|
1495 |
},
|
1496 |
"10901028881100056900":{
|
1497 |
+
"p":0.7741935484,
|
1498 |
+
"r":0.75,
|
1499 |
+
"f":0.7619047619
|
1500 |
},
|
1501 |
"411":{
|
1502 |
+
"p":0.8611111111,
|
1503 |
+
"r":0.756097561,
|
1504 |
+
"f":0.8051948052
|
1505 |
},
|
1506 |
"12549387360942434255":{
|
1507 |
+
"p":0.4285714286,
|
1508 |
+
"r":0.45,
|
1509 |
+
"f":0.4390243902
|
1510 |
},
|
1511 |
"303601073839818384":{
|
1512 |
"p":0.5,
|
1513 |
+
"r":0.375,
|
1514 |
+
"f":0.4285714286
|
1515 |
},
|
1516 |
"8884235091647096537":{
|
1517 |
+
"p":0.0,
|
1518 |
+
"r":0.0,
|
1519 |
+
"f":0.0
|
1520 |
},
|
1521 |
"2249809950233855422":{
|
1522 |
+
"p":0.6363636364,
|
1523 |
+
"r":0.65625,
|
1524 |
+
"f":0.6461538462
|
1525 |
},
|
1526 |
"422":{
|
1527 |
+
"p":0.3076923077,
|
1528 |
+
"r":0.5333333333,
|
1529 |
+
"f":0.3902439024
|
1530 |
},
|
1531 |
"8110129090154140942":{
|
1532 |
+
"p":0.96875,
|
1533 |
+
"r":0.9489795918,
|
1534 |
+
"f":0.9587628866
|
1535 |
},
|
1536 |
"412":{
|
1537 |
+
"p":0.85,
|
1538 |
+
"r":0.4594594595,
|
1539 |
+
"f":0.5964912281
|
1540 |
},
|
1541 |
"436":{
|
1542 |
+
"p":0.3953488372,
|
1543 |
+
"r":0.2328767123,
|
1544 |
+
"f":0.2931034483
|
1545 |
},
|
1546 |
"450":{
|
1547 |
+
"p":0.9594594595,
|
1548 |
"r":0.9594594595,
|
1549 |
+
"f":0.9594594595
|
1550 |
},
|
1551 |
"12837356684637874264":{
|
1552 |
+
"p":0.7777777778,
|
1553 |
+
"r":0.6021505376,
|
1554 |
+
"f":0.6787878788
|
|
|
|
|
|
|
|
|
|
|
1555 |
},
|
1556 |
"451":{
|
1557 |
+
"p":0.6,
|
1558 |
+
"r":0.625,
|
1559 |
+
"f":0.612244898
|
1560 |
},
|
1561 |
"7349492218059511525":{
|
1562 |
+
"p":0.8181818182,
|
1563 |
+
"r":0.9,
|
1564 |
+
"f":0.8571428571
|
1565 |
},
|
1566 |
"426":{
|
1567 |
+
"p":0.7142857143,
|
1568 |
"r":0.4545454545,
|
1569 |
+
"f":0.5555555556
|
1570 |
},
|
1571 |
"405":{
|
1572 |
+
"p":0.8181818182,
|
1573 |
+
"r":0.75,
|
1574 |
+
"f":0.7826086957
|
1575 |
},
|
1576 |
"17865338459503383721":{
|
1577 |
"p":1.0,
|
|
|
1588 |
"r":0.975,
|
1589 |
"f":0.975
|
1590 |
},
|
1591 |
+
"408":{
|
1592 |
"p":0.0,
|
1593 |
"r":0.0,
|
1594 |
"f":0.0
|
1595 |
},
|
1596 |
+
"11190527879068114961":{
|
1597 |
"p":0.0,
|
1598 |
"r":0.0,
|
1599 |
"f":0.0
|
1600 |
},
|
1601 |
+
"3350290345017230236":{
|
1602 |
+
"p":0.1666666667,
|
1603 |
+
"r":0.0833333333,
|
1604 |
+
"f":0.1111111111
|
1605 |
+
},
|
1606 |
"10069665988847657778":{
|
1607 |
"p":0.0,
|
1608 |
"r":0.0,
|
|
|
1613 |
"r":0.1666666667,
|
1614 |
"f":0.2857142857
|
1615 |
},
|
1616 |
+
"6522094215780122214":{
|
1617 |
+
"p":0.8,
|
1618 |
+
"r":1.0,
|
1619 |
+
"f":0.8888888889
|
1620 |
+
},
|
1621 |
"203073658115086772":{
|
1622 |
"p":0.0,
|
1623 |
"r":0.0,
|
1624 |
"f":0.0
|
|
|
|
|
|
|
|
|
|
|
1625 |
}
|
1626 |
},
|
1627 |
+
"ents_p":0.9069524307,
|
1628 |
+
"ents_r":0.9150843882,
|
1629 |
+
"ents_f":0.9110002625,
|
1630 |
"ents_per_type":{
|
1631 |
"ORG":{
|
1632 |
+
"p":0.9245977011,
|
1633 |
+
"r":0.9323133982,
|
1634 |
+
"f":0.9284395199
|
1635 |
},
|
1636 |
"PER":{
|
1637 |
+
"p":0.9425695678,
|
1638 |
+
"r":0.9510155317,
|
1639 |
+
"f":0.9467737139
|
1640 |
},
|
1641 |
"LOC":{
|
1642 |
+
"p":0.9274977896,
|
1643 |
+
"r":0.9105902778,
|
1644 |
+
"f":0.9189662724
|
1645 |
},
|
1646 |
"MISC":{
|
1647 |
+
"p":0.7432795699,
|
1648 |
+
"r":0.7843971631,
|
1649 |
+
"f":0.7632850242
|
1650 |
}
|
1651 |
},
|
1652 |
+
"speed":2397.7104249023
|
1653 |
},
|
1654 |
"sources":[
|
1655 |
{
|
morphologizer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3522673
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:faa0e827a09123347ae604288a765268ab410cdce68e8310894ffd6be9838161
|
3 |
size 3522673
|
ner/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4ef597c4b89292feb19339c4cd5ed1e53a84c4e793296b7b4834f23dbaf5836b
|
3 |
+
size 443626222
|
senter/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 6792
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4a3d7eedee9c2aa804251e45353078de93b816b475dea6700826d3c9bb799e5f
|
3 |
size 6792
|
tagger/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 52932
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:87db3e90005a834ee1a87af6c536d8f5535f93c2e1a1bfd86b632288a70fd877
|
3 |
size 52932
|
tokenizer
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
-
��prefix_search�G^…|^……|^,|^:|^;|^\!|^\?|^¿|^؟|^¡|^\(|^\)|^\[|^\]|^\{|^\}|^<|^>|^_|^#|^\*|^&|^。|^?|^!|^,|^、|^;|^:|^~|^·|^।|^،|^۔|^؛|^٪|^\.\.+|^…|^\'|^"|^”|^“|^`|^‘|^´|^’|^‚|^,|^„|^»|^«|^「|^」|^『|^』|^(|^)|^〔|^〕|^【|^】|^《|^》|^〈|^〉|^\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D|^[,.:](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])�suffix_search�%�\+$|…$|……$|,$|:$|;$|\!$|\?$|¿$|؟$|¡$|\($|\)$|\[$|\]$|\{$|\}$|<$|>$|_$|#$|\*$|&$|。$|?$|!$|,$|、$|;$|:$|~$|·$|।$|،$|۔$|؛$|٪$|\.\.+$|…$|\'$|"$|”$|“$|`$|‘$|´$|’$|‚$|,$|„$|»$|«$|「$|」$|『$|』$|($|)$|〔$|〕$|【$|】$|《$|》$|〈$|〉$|\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D$|(?<=[0-9])\+$|(?<=°[FfCcKk])\.$|(?<=[0-9])(?:[\$¢£€¥฿])$|(?<=[0-9])(?:km|km²|km³|m|m²|m³|dm|dm²|dm³|cm|cm²|cm³|mm|mm²|mm³|ha|µm|nm|yd|in|ft|kg|g|mg|µg|t|lb|oz|m/s|km/h|kmh|mph|hPa|Pa|mbar|mb|MB|kb|KB|gb|GB|tb|TB|T|G|M|K||км|км²|км³|м|м²|м³|дм|дм²|дм³|см|см²|см³|мм|мм²|мм³|нм|кг|г|мг|м/с|км/ч|кПа|Па|мбар|Кб|КБ|кб|Мб|МБ|мб|Гб|ГБ|гб|Тб|ТБ|тбكم|كم²|كم³|م|م²|م³|سم|سم²|سم³|مم|مم²|مم³|كم|غرام|جرام|جم|كغ|ملغ|كوب|اكواب)$|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F%²\-\+\'"”“`‘´’‚,„»«「」『』()〔〕【】《》〈〉(?:\$¢£€¥฿)])\.$|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F)])-e$�infix_finditer�QZ\.\.+|…|\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])\.(?=[A-Z\uFF21-\uFF3A\u00C0-\u00D6\u00D8-\u00DE\u0100\u0102\u0104\u0106\u0108\u010A\u010C\u010E\u0110\u0112\u0114\u0116\u0118\u011A\u011C\u011E\u0120\u0122\u0124\u0126\u0128\u012A\u012C\u012E\u0130\u0132\u0134\u0136\u0139\u013B\u013D\u013F\u0141\u0143\u0145\u0147\u014A\u014C\u014E\u0150\u0152\u0154\u0156\u0158\u015A\u015C\u015E\u0160\u0162\u0164\u0166\u0168\u016A\u016C\u016E\u0170\u0172\u0174\u0176\u0178\u0179\u017B\u017D\u0181\u0182\u0184\u0186\u0187\u0189-\u018B\u018E-\u0191\u0193\u0194\u0196-\u0198\u019C\u019D\u019F\u01A0\u01A2\u01A4\u01A6\u01A7\u01A9\u01AC\u01AE\u01AF\u01B1-\u01B3\u01B5\u01B7\u01B8\u01BC\u01C4\u01C7\u01CA\u01CD\u01CF\u01D1\u01D3\u01D5\u01D7\u01D9\u01DB\u01DE\u01E0\u01E2\u01E4\u01E6\u01E8\u01EA\u01EC\u01EE\u01F1\u01F4\u01F6-\u01F8\u01FA\u01FC\u01FE\u0200\u0202\u0204\u0206\u0208\u020A\u020C\u020E\u0210\u0212\u0214\u0216\u0218\u021A\u021C\u021E\u0220\u0222\u0224\u0226\u0228\u022A\u022C\u022E\u0230\u0232\u023A\u023B\u023D\u023E\u0241\u0243-\u0246\u0248\u024A\u024C\u024E\u2C60\u2C62-\u2C64\u2C67\u2C69\u2C6B\u2C6D-\u2C70\u2C72\u2C75\u2C7E\u2C7F\uA722\uA724\uA726\uA728\uA72A\uA72C\uA72E\uA732\uA734\uA736\uA738\uA73A\uA73C\uA73E\uA740\uA742\uA744\uA746\uA748\uA74A\uA74C\uA74E\uA750\uA752\uA754\uA756\uA758\uA75A\uA75C\uA75E\uA760\uA762\uA764\uA766\uA768\uA76A\uA76C\uA76E\uA779\uA77B\uA77D\uA77E\uA780\uA782\uA784\uA786\uA78B\uA78D\uA790\uA792\uA796\uA798\uA79A\uA79C\uA79E\uA7A0\uA7A2\uA7A4\uA7A6\uA7A8\uA7AA-\uA7AE\uA7B0-\uA7B4\uA7B6\uA7B8\u1E00\u1E02\u1E04\u1E06\u1E08\u1E0A\u1E0C\u1E0E\u1E10\u1E12\u1E14\u1E16\u1E18\u1E1A\u1E1C\u1E1E\u1E20\u1E22\u1E24\u1E26\u1E28\u1E2A\u1E2C\u1E2E\u1E30\u1E32\u1E34\u1E36\u1E38\u1E3A\u1E3C\u1E3E\u1E40\u1E42\u1E44\u1E46\u1E48\u1E4A\u1E4C\u1E4E\u1E50\u1E52\u1E54\u1E56\u1E58\u1E5A\u1E5C\u1E5E\u1E60\u1E62\u1E64\u1E66\u1E68\u1E6A\u1E6C\u1E6E\u1E70\u1E72\u1E74\u1E76\u1E78\u1E7A\u1E7C\u1E7E\u1E80\u1E82\u1E84\u1E86\u1E88\u1E8A\u1E8C\u1E8E\u1E90\u1E92\u1E94\u1E9E\u1EA0\u1EA2\u1EA4\u1EA6\u1EA8\u1EAA\u1EAC\u1EAE\u1EB0\u1EB2\u1EB4\u1EB6\u1EB8\u1EBA\u1EBC\u1EBE\u1EC0\u1EC2\u1EC4\u1EC6\u1EC8\u1ECA\u1ECC\u1ECE\u1ED0\u1ED2\u1ED4\u1ED6\u1ED8\u1EDA\u1EDC\u1EDE\u1EE0\u1EE2\u1EE4\u1EE6\u1EE8\u1EEA\u1EEC\u1EEE\u1EF0\u1EF2\u1EF4\u1EF6\u1EF8\u1EFA\u1EFC\u1EFEЁА-ЯӘӨҮҖҢҺΑ-ΩΆΈΊΌΏΉΎА-ЩЮЯІЇЄҐЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])[,!?](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])[:<>=](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])--(?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F]),(?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])([\"”“`‘´’‚,„»«「」『』()〔〕【】《》〈〉\)\]\(\[])(?=[\-A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])�token_match�
|
2 |
��A�
|
3 |
� ��A� �'��A�'�''��A�''�(*_*)��A�(*_*)�(-8��A�(-8�(-:��A�(-:�(-;��A�(-;�(-_-)��A�(-_-)�(._.)��A�(._.)�(:��A�(:�(;��A�(;�(=��A�(=�(>_<)��A�(>_<)�(^_^)��A�(^_^)�(o:��A�(o:�(¬_¬)��A�(¬_¬)�(ಠ_ಠ)��A�(ಠ_ಠ)�(╯°□°)╯︵┻━┻��A�(╯°□°)╯︵┻━┻�)-:��A�)-:�):��A�):�-_-��A�-_-�-__-��A�-__-�-e��A�-e�._.��A�._.�0.0��A�0.0�0.o��A�0.o�0_0��A�0_0�0_o��A�0_o�8)��A�8)�8-)��A�8-)�8-D��A�8-D�8D��A�8D�:'(��A�:'(�:')��A�:')�:'-(��A�:'-(�:'-)��A�:'-)�:(��A�:(�:((��A�:((�:(((��A�:(((�:()��A�:()�:)��A�:)�:))��A�:))�:)))��A�:)))�:*��A�:*�:-(��A�:-(�:-((��A�:-((�:-(((��A�:-(((�:-)��A�:-)�:-))��A�:-))�:-)))��A�:-)))�:-*��A�:-*�:-/��A�:-/�:-0��A�:-0�:-3��A�:-3�:->��A�:->�:-D��A�:-D�:-O��A�:-O�:-P��A�:-P�:-X��A�:-X�:-]��A�:-]�:-o��A�:-o�:-p��A�:-p�:-x��A�:-x�:-|��A�:-|�:-}��A�:-}�:/��A�:/�:0��A�:0�:1��A�:1�:3��A�:3�:>��A�:>�:D��A�:D�:O��A�:O�:P��A�:P�:X��A�:X�:]��A�:]�:o��A�:o�:o)��A�:o)�:p��A�:p�:x��A�:x�:|��A�:|�:}��A�:}�:’(��A�:’(�:’)��A�:’)�:’-(��A�:’-(�:’-)��A�:’-)�;)��A�;)�;-)��A�;-)�;-D��A�;-D�;D��A�;D�;_;��A�;_;�<.<��A�<.<�</3��A�</3�<3��A�<3�<33��A�<33�<333��A�<333�<space>��A�<space>�=(��A�=(�=)��A�=)�=/��A�=/�=3��A�=3�=D��A�=D�=[��A�=[�=]��A�=]�=|��A�=|�>.<��A�>.<�>.>��A�>.>�>:(��A�>:(�>:o��A�>:o�><(((*>��A�><(((*>�@_@��A�@_@�A.��A�A.�AG.��A�AG.�AkH.��A�AkH.�Aö.��A�Aö.�B.��A�B.�B.CS.��A�B.CS.�B.S.��A�B.S.�B.Sc.��A�B.Sc.�B.ú.é.k.��A�B.ú.é.k.�BE.��A�BE.�BEK.��A�BEK.�BSC.��A�BSC.�BSc.��A�BSc.�BTK.��A�BTK.�Bat.��A�Bat.�Be.��A�Be.�Bek.��A�Bek.�Bfok.��A�Bfok.�Bk.��A�Bk.�Bp.��A�Bp.�Bros.��A�Bros.�Bt.��A�Bt.�Btk.��A�Btk.�Btke.��A�Btke.�Btét.��A�Btét.�C++��A�C++�C.��A�C.�CSC.��A�CSC.�Cal.��A�Cal.�Cg.��A�Cg.�Cgf.��A�Cgf.�Cgt.��A�Cgt.�Cia.��A�Cia.�Co.��A�Co.�Colo.��A�Colo.�Comp.��A�Comp.�Copr.��A�Copr.�Corp.��A�Corp.�Cos.��A�Cos.�Cs.��A�Cs.�Csc.��A�Csc.�Csop.��A�Csop.�Cstv.��A�Cstv.�Ctv.��A�Ctv.�Ctvr.��A�Ctvr.�D.��A�D.�DR.��A�DR.�Dipl.��A�Dipl.�Dr.��A�Dr.�Dsz.��A�Dsz.�Dzs.��A�Dzs.�E.��A�E.�EK.��A�EK.�EU.��A�EU.�F.��A�F.�Fla.��A�Fla.�Folyt.��A�Folyt.�Fpk.��A�Fpk.�Főszerk.��A�Főszerk.�G.��A�G.�GK.��A�GK.�GM.��A�GM.�Gfv.��A�Gfv.�Gmk.��A�Gmk.�Gr.��A�Gr.�Group.��A�Group.�Gt.��A�Gt.�Gy.��A�Gy.�H.��A�H.�HKsz.��A�HKsz.�Hmvh.��A�Hmvh.�I.��A�I.�Ifj.��A�Ifj.�Inc.��A�Inc.�Inform.��A�Inform.�Int.��A�Int.�J.��A�J.�Jr.��A�Jr.�Jv.��A�Jv.�K.��A�K.�K.m.f.��A�K.m.f.�KB.��A�KB.�KER.��A�KER.�KFT.��A�KFT.�KRT.��A�KRT.�Kb.��A�Kb.�Ker.��A�Ker.�Kft.��A�Kft.�Kg.��A�Kg.�Kht.��A�Kht.�Kkt.��A�Kkt.�Kong.��A�Kong.�Korm.��A�Korm.�Kr.��A�Kr.�Kr.e.��A�Kr.e.�Kr.u.��A�Kr.u.�Krt.��A�Krt.�L.��A�L.�LB.��A�LB.�Llc.��A�Llc.�Ltd.��A�Ltd.�M.��A�M.�M.A.��A�M.A.�M.S.��A�M.S.�M.SC.��A�M.SC.�M.Sc.��A�M.Sc.�MA.��A�MA.�MH.��A�MH.�MSC.��A�MSC.�MSc.��A�MSc.�Mass.��A�Mass.�Max.��A�Max.�Mlle.��A�Mlle.�Mme.��A�Mme.�Mo.��A�Mo.�Mr.��A�Mr.�Mrs.��A�Mrs.�Ms.��A�Ms.�Mt.��A�Mt.�N.��A�N.�N.N.��A�N.N.�NB.��A�NB.�NBr.��A�NBr.�Nat.��A�Nat.�No.��A�No.�Nr.��A�Nr.�Ny.��A�Ny.�Nyh.��A�Nyh.�Nyr.��A�Nyr.�Nyrt.��A�Nyrt.�O.��A�O.�O.O��A�O.O�O.o��A�O.o�OJ.��A�OJ.�O_O��A�O_O�O_o��A�O_o�Op.��A�Op.�P.��A�P.�P.H.��A�P.H.�P.S.��A�P.S.�PH.D.��A�PH.D.�PHD.��A�PHD.�PROF.��A�PROF.�Pf.��A�Pf.�Ph.D��A�Ph.D�PhD.��A�PhD.�Pk.��A�Pk.�Pl.��A�Pl.�Plc.��A�Plc.�Pp.��A�Pp.�Proc.��A�Proc.�Prof.��A�Prof.�Ptk.��A�Ptk.�R.��A�R.�RT.��A�RT.�Rer.��A�Rer.�Rt.��A�Rt.�S.��A�S.�S.B.��A�S.B.�SZOLG.��A�SZOLG.�Salg.��A�Salg.�Sch.��A�Sch.�Spa.��A�Spa.�St.��A�St.�Sz.��A�Sz.�SzRt.��A�SzRt.�Szerk.��A�Szerk.�Szfv.��A�Szfv.�Szjt.��A�Szjt.�Szolg.��A�Szolg.�Szt.��A�Szt.�Sztv.��A�Sztv.�Szvt.��A�Szvt.�Számv.��A�Számv.�T.��A�T.�TEL.��A�TEL.�Tel.��A�Tel.�Ty.��A�Ty.�Tyr.��A�Tyr.�U.��A�U.�Ui.��A�Ui.�Ut.��A�Ut.�V.��A�V.�V.V��A�V.V�VB.��A�VB.�V_V��A�V_V�Vcs.��A�Vcs.�Vhr.��A�Vhr.�Vht.��A�Vht.�Várm.��A�Várm.�W.��A�W.�X.��A�X.�X.Y.��A�X.Y.�XD��A�XD�XDD��A�XDD�Y.��A�Y.�Z.��A�Z.�Zrt.��A�Zrt.�Zs.��A�Zs.�[-:��A�[-:�[:��A�[:�[=��A�[=�\")��A�\")�\n��A�\n�\t��A�\t�]=��A�]=�^_^��A�^_^�^__^��A�^__^�^___^��A�^___^�a.��A�a.�a.C.��A�a.C.�ac.��A�ac.�adj.��A�adj.�adm.��A�adm.�ag.��A�ag.�agit.��A�agit.�alez.��A�alez.�alk.��A�alk.�all.��A�all.�altbgy.��A�altbgy.�an.��A�an.�ang.��A�ang.�arch.��A�arch.�at.��A�at.�atc.��A�atc.�aug.��A�aug.�b.��A�b.�b.a.��A�b.a.�b.s.��A�b.s.�b.sc.��A�b.sc.�bek.��A�bek.�belker.��A�belker.�berend.��A�berend.�biz.��A�biz.�bizt.��A�bizt.�bo.��A�bo.�bp.��A�bp.�br.��A�br.�bsc.��A�bsc.�bt.��A�bt.�btk.��A�btk.�c.��A�c.�ca.��A�ca.�cc.��A�cc.�cca.��A�cca.�cf.��A�cf.�cif.��A�cif.�co.��A�co.�corp.��A�corp.�cos.��A�cos.�cs.��A�cs.�csc.��A�csc.�csüt.��A�csüt.�cső.��A�cső.�ctv.��A�ctv.�d.��A�d.�dbj.��A�dbj.�dd.��A�dd.�ddr.��A�ddr.�de.��A�de.�dec.��A�dec.�dikt.��A�dikt.�dipl.��A�dipl.�dj.��A�dj.�dk.��A�dk.�dl.��A�dl.�dny.��A�dny.�dolg.��A�dolg.�dr.��A�dr.�du.��A�du.�dzs.��A�dzs.�e.��A�e.�ea.��A�ea.�ed.��A�ed.�eff.��A�eff.�egyh.��A�egyh.�ell.��A�ell.�elv.��A�elv.�elvt.��A�elvt.�em.��A�em.�eng.��A�eng.�eny.��A�eny.�et.��A�et.�etc.��A�etc.�ev.��A�ev.�ezr.��A�ezr.�eü.��A�eü.�f.��A�f.�f.h.��A�f.h.�f.é.��A�f.é.�fam.��A�fam.�fb.��A�fb.�febr.��A�febr.�fej.��A�fej.�felv.��A�felv.�felügy.��A�felügy.�ff.��A�ff.�ffi.��A�ffi.�fhdgy.��A�fhdgy.�fil.��A�fil.�fiz.��A�fiz.�fm.��A�fm.�foglalk.��A�foglalk.�ford.��A�ford.�fp.��A�fp.�fr.��A�fr.�frsz.��A�frsz.�fszla.��A�fszla.�fszt.��A�fszt.�ft.��A�ft.�fuv.��A�fuv.�főig.��A�főig.�főisk.��A�főisk.�főtörm.��A�főtörm.�főv.��A�főv.�g.��A�g.�gazd.��A�gazd.�gimn.��A�gimn.�gk.��A�gk.�gkv.��A�gkv.�gmk.��A�gmk.�gondn.��A�gondn.�gr.��A�gr.�grav.��A�grav.�gy.��A�gy.�gyak.��A�gyak.�gyártm.��A�gyártm.�gör.��A�gör.�h.��A�h.�hads.��A�hads.�hallg.��A�hallg.�hdm.��A�hdm.�hdp.��A�hdp.�hds.��A�hds.�hg.��A�hg.�hiv.��A�hiv.�hk.��A�hk.�hm.��A�hm.�ho.��A�ho.�honv.��A�honv.�hp.��A�hp.�hr.��A�hr.�hrsz.��A�hrsz.�hsz.��A�hsz.�ht.��A�ht.�htb.��A�htb.�hv.��A�hv.�hőm.��A�hőm.�i.��A�i.�i.e.��A�i.e.�i.sz.��A�i.sz.�id.��A�id.�ie.��A�ie.�ifj.��A�ifj.�ig.��A�ig.�igh.��A�igh.�ill.��A�ill.�imp.��A�imp.�inc.��A�inc.�ind.��A�ind.�inform.��A�inform.�inic.��A�inic.�int.��A�int.�io.��A�io.�ip.��A�ip.�ir.��A�ir.�irod.��A�irod.�isk.��A�isk.�ism.��A�ism.�izr.��A�izr.�iá.��A�iá.�j.��A�j.�jan.��A�jan.�jav.��A�jav.�jegyz.��A�jegyz.�jgmk.��A�jgmk.�jjv.��A�jjv.�jkv.��A�jkv.�jogh.��A�jogh.�jogt.��A�jogt.�jr.��A�jr.�jvb.��A�jvb.�júl.��A�júl.�jún.��A�jún.�k.��A�k.�karb.��A�karb.�kat.��A�kat.�kath.��A�kath.�kb.��A�kb.�kcs.��A�kcs.�kd.��A�kd.�ker.��A�ker.�kf.��A�kf.�kft.��A�kft.�kht.��A�kht.�kir.��A�kir.�kirend.��A�kirend.�kisip.��A�kisip.�kiv.��A�kiv.�kk.��A�kk.�kkt.��A�kkt.�klin.��A�klin.�km.��A�km.�korm.��A�korm.�kp.��A�kp.�krt.��A�krt.�kt.��A�kt.�ktsg.��A�ktsg.�kult.��A�kult.�kv.��A�kv.�kve.��A�kve.�képv.��A�képv.�kísérl.��A�kísérl.�kóth.��A�kóth.�könyvt.��A�könyvt.�körz.��A�körz.�köv.��A�köv.�közj.��A�közj.�közl.��A�közl.�közp.��A�közp.�közt.��A�közt.�kü.��A�kü.�l.��A�l.�lat.��A�lat.�ld.��A�ld.�legs.��A�legs.�lg.��A�lg.�lgv.��A�lgv.�loc.��A�loc.�lt.��A�lt.�ltd.��A�ltd.�ltp.��A�ltp.�luth.��A�luth.�m.��A�m.�m.a.��A�m.a.�m.s.��A�m.s.�m.sc.��A�m.sc.�ma.��A�ma.�mat.��A�mat.�max.��A�max.�mb.��A�mb.�med.��A�med.�megh.��A�megh.�met.��A�met.�mf.��A�mf.�mfszt.��A�mfszt.�min.��A�min.�miss.��A�miss.�mjr.��A�mjr.�mjv.��A�mjv.�mk.��A�mk.�mlle.��A�mlle.�mme.��A�mme.�mn.��A�mn.�mozg.��A�mozg.�mr.��A�mr.�mrs.��A�mrs.�ms.��A�ms.�msc.��A�msc.�má.��A�má.�máj.��A�máj.�márc.��A�márc.�mé.��A�mé.�mélt.��A�mélt.�mü.��A�mü.�műh.��A�műh.�műsz.��A�műsz.�műv.��A�műv.�művez.��A�művez.�n.��A�n.�nagyker.��A�nagyker.�nagys.��A�nagys.�nat.��A�nat.�nb.��A�nb.�neg.��A�neg.�nk.��A�nk.�no.��A�no.�nov.��A�nov.�nu.��A�nu.�ny.��A�ny.�nyilv.��A�nyilv.�nyrt.��A�nyrt.�nyug.��A�nyug.�o.��A�o.�o.0��A�o.0�o.O��A�o.O�o.o��A�o.o�o_0��A�o_0�o_O��A�o_O�o_o��A�o_o�obj.��A�obj.�okl.��A�okl.�okt.��A�okt.�old.��A�old.�olv.��A�olv.�orsz.��A�orsz.�ort.��A�ort.�ov.��A�ov.�ovh.��A�ovh.�p.��A�p.�pf.��A�pf.�pg.��A�pg.�ph.d��A�ph.d�ph.d.��A�ph.d.�phd.��A�phd.�phil.��A�phil.�pjt.��A�pjt.�pk.��A�pk.�pl.��A�pl.�plb.��A�plb.�plc.��A�plc.�pld.��A�pld.�plur.��A�plur.�pol.��A�pol.�polg.��A�polg.�poz.��A�poz.�pp.��A�pp.�proc.��A�proc.�prof.��A�prof.�prot.��A�prot.�pság.��A�pság.�ptk.��A�ptk.�pu.��A�pu.�pü.��A�pü.�q.��A�q.�r.��A�r.�r.k.��A�r.k.�rac.��A�rac.�rad.��A�rad.�red.��A�red.�ref.��A�ref.�reg.��A�reg.�rer.��A�rer.�rev.��A�rev.�rf.��A�rf.�rkp.��A�rkp.�rkt.��A�rkt.�rt.��A�rt.�rtg.��A�rtg.�röv.��A�röv.�s.��A�s.�s.b.��A�s.b.�s.k.��A�s.k.�sa.��A�sa.�sb.��A�sb.�sel.��A�sel.�sgt.��A�sgt.�sm.��A�sm.�st.��A�st.�stat.��A�stat.�stb.��A�stb.�strat.��A�strat.�stud.��A�stud.�sz.��A�sz.�szakm.��A�szakm.�szaksz.��A�szaksz.�szakszerv.��A�szakszerv.�szd.��A�szd.�szds.��A�szds.�szept.��A�szept.�szerk.��A�szerk.�szf.��A�szf.�szimf.��A�szimf.�szjt.��A�szjt.�szkv.��A�szkv.�szla.��A�szla.�szn.��A�szn.�szolg.��A�szolg.�szt.��A�szt.�szubj.��A�szubj.�szöv.��A�szöv.�szül.��A�szül.�t.��A�t.�tanm.��A�tanm.�tb.��A�tb.�tbk.��A�tbk.�tc.��A�tc.�techn.��A�techn.�tek.��A�tek.�tel.��A�tel.�tf.��A�tf.�tgk.��A�tgk.�ti.��A�ti.�tip.��A�tip.�tisztv.��A�tisztv.�titks.��A�titks.�tk.��A�tk.�tkp.��A�tkp.�tny.��A�tny.�tp.��A�tp.�tszf.��A�tszf.�tszk.��A�tszk.�tszkv.��A�tszkv.�tv.��A�tv.�tvr.��A�tvr.�ty.��A�ty.�törv.��A�törv.�tü.��A�tü.�u.��A�u.�ua.��A�ua.�ui.��A�ui.�unit.��A�unit.�uo.��A�uo.�uv.��A�uv.�v.��A�v.�v.v��A�v.v�v_v��A�v_v�vas.��A�vas.�vb.��A�vb.�vegy.��A�vegy.�vh.��A�vh.�vhol.��A�vhol.�vhr.��A�vhr.�vill.��A�vill.�vizsg.��A�vizsg.�vk.��A�vk.�vkf.��A�vkf.�vkny.��A�vkny.�vm.��A�vm.�vol.��A�vol.�vs.��A�vs.�vsz.��A�vsz.�vv.��A�vv.�vál.��A�vál.�várm.��A�várm.�vízv.��A�vízv.�vö.��A�vö.�w.��A�w.�x.��A�x.�xD��A�xD�xDD��A�xDD�y.��A�y.�z.��A�z.�zrt.��A�zrt.�zs.��A�zs.� ��A� C� �¯\(ツ)/¯��A�¯\(ツ)/¯�°C��A�°C�°C.��A�°C�A�.�°F��A�°F�°F.��A�°F�A�.�°K��A�°K�°K.��A�°K�A�.�°c��A�°c�°c.��A�°c�A�.�°f��A�°f�°f.��A�°f�A�.�°k��A�°k�°k.��A�°k�A�.�Á.��A�Á.�Áe.��A�Áe.�Áht.��A�Áht.�É.��A�É.�Épt.��A�Épt.�Ész.��A�Ész.�Új-Z.��A�Új-Z.�ÚjZ.��A�ÚjZ.�Ún.��A�Ún.�á.��A�á.�ált.��A�ált.�ápr.��A�ápr.�ásv.��A�ásv.�ä.��A�ä.�é.��A�é.�ék.��A�ék.�ény.��A�ény.�érk.��A�érk.�évf.��A�évf.�í.��A�í.�ó.��A�ó.�ö.��A�ö.�össz.��A�össz.�ötk.��A�ötk.�özv.��A�özv.�ú.��A�ú.�ú.n.��A�ú.n.�úm.��A�úm.�ún.��A�ún.�út.��A�út.�ü.��A�ü.�üag.��A�üag.�üd.��A�üd.�üdv.��A�üdv.�üe.��A�üe.�ümk.��A�ümk.�ütk.��A�ütk.�üv.��A�üv.�őrgy.��A�őrgy.�őrpk.��A�őrpk.�őrv.��A�őrv.�ű.��A�ű.�ಠ_ಠ��A�ಠ_ಠ�ಠ︵ಠ��A�ಠ︵ಠ�—��A�—�’��A�’�’’��A�’’�faster_heuristics�
|
|
|
1 |
+
��prefix_search�[^…|^……|^,|^:|^;|^\!|^\?|^¿|^؟|^¡|^\(|^\)|^\[|^\]|^\{|^\}|^<|^>|^_|^#|^\*|^&|^。|^?|^!|^,|^、|^;|^:|^~|^·|^।|^،|^۔|^؛|^٪|^\.\.+|^…|^\'|^"|^”|^“|^`|^‘|^´|^’|^‚|^,|^„|^»|^«|^「|^」|^『|^』|^(|^)|^〔|^〕|^【|^】|^《|^》|^〈|^〉|^〈|^〉|^⟦|^⟧|^\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D|^[,.:](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])�suffix_search�%�\+$|…$|……$|,$|:$|;$|\!$|\?$|¿$|؟$|¡$|\($|\)$|\[$|\]$|\{$|\}$|<$|>$|_$|#$|\*$|&$|。$|?$|!$|,$|、$|;$|:$|~$|·$|।$|،$|۔$|؛$|٪$|\.\.+$|…$|\'$|"$|”$|“$|`$|‘$|´$|’$|‚$|,$|„$|»$|«$|「$|」$|『$|』$|($|)$|〔$|〕$|【$|】$|《$|》$|〈$|〉$|〈$|〉$|⟦$|⟧$|\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D$|(?<=[0-9])\+$|(?<=°[FfCcKk])\.$|(?<=[0-9])(?:[\$¢£€¥฿])$|(?<=[0-9])(?:km|km²|km³|m|m²|m³|dm|dm²|dm³|cm|cm²|cm³|mm|mm²|mm³|ha|µm|nm|yd|in|ft|kg|g|mg|µg|t|lb|oz|m/s|km/h|kmh|mph|hPa|Pa|mbar|mb|MB|kb|KB|gb|GB|tb|TB|T|G|M|K||км|км²|км³|м|м²|м³|дм|дм²|дм³|см|см²|см³|мм|мм²|мм³|нм|кг|г|мг|м/с|км/ч|кПа|Па|мбар|Кб|КБ|кб|Мб|МБ|мб|Гб|ГБ|гб|Тб|ТБ|тбكم|كم²|كم³|م|م²|م³|سم|سم²|سم³|مم|مم²|مم³|كم|غرام|جرام|جم|كغ|ملغ|كوب|اكواب)$|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F%²\-\+\'"”“`‘´’‚,„»«「」『』()〔〕【】《》〈〉〈〉⟦⟧(?:\$¢£€¥฿)])\.$|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F)])-e$�infix_finditer�Qf\.\.+|…|\u00A6\u00A9\u00AE\u00B0\u0482\u058D\u058E\u060E\u060F\u06DE\u06E9\u06FD\u06FE\u07F6\u09FA\u0B70\u0BF3-\u0BF8\u0BFA\u0C7F\u0D4F\u0D79\u0F01-\u0F03\u0F13\u0F15-\u0F17\u0F1A-\u0F1F\u0F34\u0F36\u0F38\u0FBE-\u0FC5\u0FC7-\u0FCC\u0FCE\u0FCF\u0FD5-\u0FD8\u109E\u109F\u1390-\u1399\u1940\u19DE-\u19FF\u1B61-\u1B6A\u1B74-\u1B7C\u2100\u2101\u2103-\u2106\u2108\u2109\u2114\u2116\u2117\u211E-\u2123\u2125\u2127\u2129\u212E\u213A\u213B\u214A\u214C\u214D\u214F\u218A\u218B\u2195-\u2199\u219C-\u219F\u21A1\u21A2\u21A4\u21A5\u21A7-\u21AD\u21AF-\u21CD\u21D0\u21D1\u21D3\u21D5-\u21F3\u2300-\u2307\u230C-\u231F\u2322-\u2328\u232B-\u237B\u237D-\u239A\u23B4-\u23DB\u23E2-\u2426\u2440-\u244A\u249C-\u24E9\u2500-\u25B6\u25B8-\u25C0\u25C2-\u25F7\u2600-\u266E\u2670-\u2767\u2794-\u27BF\u2800-\u28FF\u2B00-\u2B2F\u2B45\u2B46\u2B4D-\u2B73\u2B76-\u2B95\u2B98-\u2BC8\u2BCA-\u2BFE\u2CE5-\u2CEA\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u2FF0-\u2FFB\u3004\u3012\u3013\u3020\u3036\u3037\u303E\u303F\u3190\u3191\u3196-\u319F\u31C0-\u31E3\u3200-\u321E\u322A-\u3247\u3250\u3260-\u327F\u328A-\u32B0\u32C0-\u32FE\u3300-\u33FF\u4DC0-\u4DFF\uA490-\uA4C6\uA828-\uA82B\uA836\uA837\uA839\uAA77-\uAA79\uFDFD\uFFE4\uFFE8\uFFED\uFFEE\uFFFC\uFFFD\U00010137-\U0001013F\U00010179-\U00010189\U0001018C-\U0001018E\U00010190-\U0001019B\U000101A0\U000101D0-\U000101FC\U00010877\U00010878\U00010AC8\U0001173F\U00016B3C-\U00016B3F\U00016B45\U0001BC9C\U0001D000-\U0001D0F5\U0001D100-\U0001D126\U0001D129-\U0001D164\U0001D16A-\U0001D16C\U0001D183\U0001D184\U0001D18C-\U0001D1A9\U0001D1AE-\U0001D1E8\U0001D200-\U0001D241\U0001D245\U0001D300-\U0001D356\U0001D800-\U0001D9FF\U0001DA37-\U0001DA3A\U0001DA6D-\U0001DA74\U0001DA76-\U0001DA83\U0001DA85\U0001DA86\U0001ECAC\U0001F000-\U0001F02B\U0001F030-\U0001F093\U0001F0A0-\U0001F0AE\U0001F0B1-\U0001F0BF\U0001F0C1-\U0001F0CF\U0001F0D1-\U0001F0F5\U0001F110-\U0001F16B\U0001F170-\U0001F1AC\U0001F1E6-\U0001F202\U0001F210-\U0001F23B\U0001F240-\U0001F248\U0001F250\U0001F251\U0001F260-\U0001F265\U0001F300-\U0001F3FA\U0001F400-\U0001F6D4\U0001F6E0-\U0001F6EC\U0001F6F0-\U0001F6F9\U0001F700-\U0001F773\U0001F780-\U0001F7D8\U0001F800-\U0001F80B\U0001F810-\U0001F847\U0001F850-\U0001F859\U0001F860-\U0001F887\U0001F890-\U0001F8AD\U0001F900-\U0001F90B\U0001F910-\U0001F93E\U0001F940-\U0001F970\U0001F973-\U0001F976\U0001F97A\U0001F97C-\U0001F9A2\U0001F9B0-\U0001F9B9\U0001F9C0-\U0001F9C2\U0001F9D0-\U0001F9FF\U0001FA60-\U0001FA6D|(?<=[a-z\uFF41-\uFF5A\u00DF-\u00F6\u00F8-\u00FF\u0101\u0103\u0105\u0107\u0109\u010B\u010D\u010F\u0111\u0113\u0115\u0117\u0119\u011B\u011D\u011F\u0121\u0123\u0125\u0127\u0129\u012B\u012D\u012F\u0131\u0133\u0135\u0137\u0138\u013A\u013C\u013E\u0140\u0142\u0144\u0146\u0148\u0149\u014B\u014D\u014F\u0151\u0153\u0155\u0157\u0159\u015B\u015D\u015F\u0161\u0163\u0165\u0167\u0169\u016B\u016D\u016F\u0171\u0173\u0175\u0177\u017A\u017C\u017E\u017F\u0180\u0183\u0185\u0188\u018C\u018D\u0192\u0195\u0199-\u019B\u019E\u01A1\u01A3\u01A5\u01A8\u01AA\u01AB\u01AD\u01B0\u01B4\u01B6\u01B9\u01BA\u01BD-\u01BF\u01C6\u01C9\u01CC\u01CE\u01D0\u01D2\u01D4\u01D6\u01D8\u01DA\u01DC\u01DD\u01DF\u01E1\u01E3\u01E5\u01E7\u01E9\u01EB\u01ED\u01EF\u01F0\u01F3\u01F5\u01F9\u01FB\u01FD\u01FF\u0201\u0203\u0205\u0207\u0209\u020B\u020D\u020F\u0211\u0213\u0215\u0217\u0219\u021B\u021D\u021F\u0221\u0223\u0225\u0227\u0229\u022B\u022D\u022F\u0231\u0233-\u0239\u023C\u023F\u0240\u0242\u0247\u0249\u024B\u024D\u024F\u2C61\u2C65\u2C66\u2C68\u2C6A\u2C6C\u2C71\u2C73\u2C74\u2C76-\u2C7B\uA723\uA725\uA727\uA729\uA72B\uA72D\uA72F-\uA731\uA733\uA735\uA737\uA739\uA73B\uA73D\uA73F\uA741\uA743\uA745\uA747\uA749\uA74B\uA74D\uA74F\uA751\uA753\uA755\uA757\uA759\uA75B\uA75D\uA75F\uA761\uA763\uA765\uA767\uA769\uA76B\uA76D\uA76F\uA771-\uA778\uA77A\uA77C\uA77F\uA781\uA783\uA785\uA787\uA78C\uA78E\uA791\uA793-\uA795\uA797\uA799\uA79B\uA79D\uA79F\uA7A1\uA7A3\uA7A5\uA7A7\uA7A9\uA7AF\uA7B5\uA7B7\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E01\u1E03\u1E05\u1E07\u1E09\u1E0B\u1E0D\u1E0F\u1E11\u1E13\u1E15\u1E17\u1E19\u1E1B\u1E1D\u1E1F\u1E21\u1E23\u1E25\u1E27\u1E29\u1E2B\u1E2D\u1E2F\u1E31\u1E33\u1E35\u1E37\u1E39\u1E3B\u1E3D\u1E3F\u1E41\u1E43\u1E45\u1E47\u1E49\u1E4B\u1E4D\u1E4F\u1E51\u1E53\u1E55\u1E57\u1E59\u1E5B\u1E5D\u1E5F\u1E61\u1E63\u1E65\u1E67\u1E69\u1E6B\u1E6D\u1E6F\u1E71\u1E73\u1E75\u1E77\u1E79\u1E7B\u1E7D\u1E7F\u1E81\u1E83\u1E85\u1E87\u1E89\u1E8B\u1E8D\u1E8F\u1E91\u1E93\u1E95-\u1E9D\u1E9F\u1EA1\u1EA3\u1EA5\u1EA7\u1EA9\u1EAB\u1EAD\u1EAF\u1EB1\u1EB3\u1EB5\u1EB7\u1EB9\u1EBB\u1EBD\u1EBF\u1EC1\u1EC3\u1EC5\u1EC7\u1EC9\u1ECB\u1ECD\u1ECF\u1ED1\u1ED3\u1ED5\u1ED7\u1ED9\u1EDB\u1EDD\u1EDF\u1EE1\u1EE3\u1EE5\u1EE7\u1EE9\u1EEB\u1EED\u1EEF\u1EF1\u1EF3\u1EF5\u1EF7\u1EF9\u1EFB\u1EFD\u1EFFёа-яәөүҗңһα-ωάέίόώήύа-щюяіїєґѓѕјљњќѐѝ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])\.(?=[A-Z\uFF21-\uFF3A\u00C0-\u00D6\u00D8-\u00DE\u0100\u0102\u0104\u0106\u0108\u010A\u010C\u010E\u0110\u0112\u0114\u0116\u0118\u011A\u011C\u011E\u0120\u0122\u0124\u0126\u0128\u012A\u012C\u012E\u0130\u0132\u0134\u0136\u0139\u013B\u013D\u013F\u0141\u0143\u0145\u0147\u014A\u014C\u014E\u0150\u0152\u0154\u0156\u0158\u015A\u015C\u015E\u0160\u0162\u0164\u0166\u0168\u016A\u016C\u016E\u0170\u0172\u0174\u0176\u0178\u0179\u017B\u017D\u0181\u0182\u0184\u0186\u0187\u0189-\u018B\u018E-\u0191\u0193\u0194\u0196-\u0198\u019C\u019D\u019F\u01A0\u01A2\u01A4\u01A6\u01A7\u01A9\u01AC\u01AE\u01AF\u01B1-\u01B3\u01B5\u01B7\u01B8\u01BC\u01C4\u01C7\u01CA\u01CD\u01CF\u01D1\u01D3\u01D5\u01D7\u01D9\u01DB\u01DE\u01E0\u01E2\u01E4\u01E6\u01E8\u01EA\u01EC\u01EE\u01F1\u01F4\u01F6-\u01F8\u01FA\u01FC\u01FE\u0200\u0202\u0204\u0206\u0208\u020A\u020C\u020E\u0210\u0212\u0214\u0216\u0218\u021A\u021C\u021E\u0220\u0222\u0224\u0226\u0228\u022A\u022C\u022E\u0230\u0232\u023A\u023B\u023D\u023E\u0241\u0243-\u0246\u0248\u024A\u024C\u024E\u2C60\u2C62-\u2C64\u2C67\u2C69\u2C6B\u2C6D-\u2C70\u2C72\u2C75\u2C7E\u2C7F\uA722\uA724\uA726\uA728\uA72A\uA72C\uA72E\uA732\uA734\uA736\uA738\uA73A\uA73C\uA73E\uA740\uA742\uA744\uA746\uA748\uA74A\uA74C\uA74E\uA750\uA752\uA754\uA756\uA758\uA75A\uA75C\uA75E\uA760\uA762\uA764\uA766\uA768\uA76A\uA76C\uA76E\uA779\uA77B\uA77D\uA77E\uA780\uA782\uA784\uA786\uA78B\uA78D\uA790\uA792\uA796\uA798\uA79A\uA79C\uA79E\uA7A0\uA7A2\uA7A4\uA7A6\uA7A8\uA7AA-\uA7AE\uA7B0-\uA7B4\uA7B6\uA7B8\u1E00\u1E02\u1E04\u1E06\u1E08\u1E0A\u1E0C\u1E0E\u1E10\u1E12\u1E14\u1E16\u1E18\u1E1A\u1E1C\u1E1E\u1E20\u1E22\u1E24\u1E26\u1E28\u1E2A\u1E2C\u1E2E\u1E30\u1E32\u1E34\u1E36\u1E38\u1E3A\u1E3C\u1E3E\u1E40\u1E42\u1E44\u1E46\u1E48\u1E4A\u1E4C\u1E4E\u1E50\u1E52\u1E54\u1E56\u1E58\u1E5A\u1E5C\u1E5E\u1E60\u1E62\u1E64\u1E66\u1E68\u1E6A\u1E6C\u1E6E\u1E70\u1E72\u1E74\u1E76\u1E78\u1E7A\u1E7C\u1E7E\u1E80\u1E82\u1E84\u1E86\u1E88\u1E8A\u1E8C\u1E8E\u1E90\u1E92\u1E94\u1E9E\u1EA0\u1EA2\u1EA4\u1EA6\u1EA8\u1EAA\u1EAC\u1EAE\u1EB0\u1EB2\u1EB4\u1EB6\u1EB8\u1EBA\u1EBC\u1EBE\u1EC0\u1EC2\u1EC4\u1EC6\u1EC8\u1ECA\u1ECC\u1ECE\u1ED0\u1ED2\u1ED4\u1ED6\u1ED8\u1EDA\u1EDC\u1EDE\u1EE0\u1EE2\u1EE4\u1EE6\u1EE8\u1EEA\u1EEC\u1EEE\u1EF0\u1EF2\u1EF4\u1EF6\u1EF8\u1EFA\u1EFC\u1EFEЁА-ЯӘӨҮҖҢҺΑ-ΩΆΈΊΌΏΉΎА-ЩЮЯІЇЄҐЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])[,!?](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])[:<>=](?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])--(?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F]),(?=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])|(?<=[A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])([\"”“`‘´’‚,„»«「」『』()〔〕【】《》〈〉〈〉⟦⟧\)\]\(\[])(?=[\-A-Za-z\uFF21-\uFF3A\uFF41-\uFF5A\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u00FF\u0100-\u017F\u0180-\u01BF\u01C4-\u024F\u2C60-\u2C7B\u2C7E\u2C7F\uA722-\uA76F\uA771-\uA787\uA78B-\uA78E\uA790-\uA7B9\uA7FA\uAB30-\uAB5A\uAB60-\uAB64\u0250-\u02AF\u1D00-\u1D25\u1D6B-\u1D77\u1D79-\u1D9A\u1E00-\u1EFFёа-яЁА-ЯәөүҗңһӘӨҮҖҢҺα-ωάέίόώήύΑ-ΩΆΈΊΌΏΉΎа-щюяіїєґА-ЩЮЯІЇЄҐѓѕјљњќѐѝЃЅЈЉЊЌЀЍ\u1200-\u137F\u0980-\u09FF\u0591-\u05F4\uFB1D-\uFB4F\u0620-\u064A\u066E-\u06D5\u06E5-\u06FF\u0750-\u077F\u08A0-\u08BD\uFB50-\uFBB1\uFBD3-\uFD3D\uFD50-\uFDC7\uFDF0-\uFDFB\uFE70-\uFEFC\U0001EE00-\U0001EEBB\u0D80-\u0DFF\u0900-\u097F\u0C80-\u0CFF\u0B80-\u0BFF\u0C00-\u0C7F\uAC00-\uD7AF\u1100-\u11FF\u3040-\u309F\u30A0-\u30FFー\u4E00-\u62FF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF\u3400-\u4DBF\U00020000-\U000215FF\U00021600-\U000230FF\U00023100-\U000245FF\U00024600-\U000260FF\U00026100-\U000275FF\U00027600-\U000290FF\U00029100-\U0002A6DF\U0002A700-\U0002B73F\U0002B740-\U0002B81F\U0002B820-\U0002CEAF\U0002CEB0-\U0002EBEF\u2E80-\u2EFF\u2F00-\u2FDF\u2FF0-\u2FFF\u3000-\u303F\u31C0-\u31EF\u3200-\u32FF\u3300-\u33FF\uF900-\uFAFF\uFE30-\uFE4F\U0001F200-\U0001F2FF\U0002F800-\U0002FA1F])�token_match�
|
2 |
��A�
|
3 |
� ��A� �'��A�'�''��A�''�(*_*)��A�(*_*)�(-8��A�(-8�(-:��A�(-:�(-;��A�(-;�(-_-)��A�(-_-)�(._.)��A�(._.)�(:��A�(:�(;��A�(;�(=��A�(=�(>_<)��A�(>_<)�(^_^)��A�(^_^)�(o:��A�(o:�(¬_¬)��A�(¬_¬)�(ಠ_ಠ)��A�(ಠ_ಠ)�(╯°□°)╯︵┻━┻��A�(╯°□°)╯︵┻━┻�)-:��A�)-:�):��A�):�-_-��A�-_-�-__-��A�-__-�-e��A�-e�._.��A�._.�0.0��A�0.0�0.o��A�0.o�0_0��A�0_0�0_o��A�0_o�8)��A�8)�8-)��A�8-)�8-D��A�8-D�8D��A�8D�:'(��A�:'(�:')��A�:')�:'-(��A�:'-(�:'-)��A�:'-)�:(��A�:(�:((��A�:((�:(((��A�:(((�:()��A�:()�:)��A�:)�:))��A�:))�:)))��A�:)))�:*��A�:*�:-(��A�:-(�:-((��A�:-((�:-(((��A�:-(((�:-)��A�:-)�:-))��A�:-))�:-)))��A�:-)))�:-*��A�:-*�:-/��A�:-/�:-0��A�:-0�:-3��A�:-3�:->��A�:->�:-D��A�:-D�:-O��A�:-O�:-P��A�:-P�:-X��A�:-X�:-]��A�:-]�:-o��A�:-o�:-p��A�:-p�:-x��A�:-x�:-|��A�:-|�:-}��A�:-}�:/��A�:/�:0��A�:0�:1��A�:1�:3��A�:3�:>��A�:>�:D��A�:D�:O��A�:O�:P��A�:P�:X��A�:X�:]��A�:]�:o��A�:o�:o)��A�:o)�:p��A�:p�:x��A�:x�:|��A�:|�:}��A�:}�:’(��A�:’(�:’)��A�:’)�:’-(��A�:’-(�:’-)��A�:’-)�;)��A�;)�;-)��A�;-)�;-D��A�;-D�;D��A�;D�;_;��A�;_;�<.<��A�<.<�</3��A�</3�<3��A�<3�<33��A�<33�<333��A�<333�<space>��A�<space>�=(��A�=(�=)��A�=)�=/��A�=/�=3��A�=3�=D��A�=D�=[��A�=[�=]��A�=]�=|��A�=|�>.<��A�>.<�>.>��A�>.>�>:(��A�>:(�>:o��A�>:o�><(((*>��A�><(((*>�@_@��A�@_@�A.��A�A.�AG.��A�AG.�AkH.��A�AkH.�Aö.��A�Aö.�B.��A�B.�B.CS.��A�B.CS.�B.S.��A�B.S.�B.Sc.��A�B.Sc.�B.ú.é.k.��A�B.ú.é.k.�BE.��A�BE.�BEK.��A�BEK.�BSC.��A�BSC.�BSc.��A�BSc.�BTK.��A�BTK.�Bat.��A�Bat.�Be.��A�Be.�Bek.��A�Bek.�Bfok.��A�Bfok.�Bk.��A�Bk.�Bp.��A�Bp.�Bros.��A�Bros.�Bt.��A�Bt.�Btk.��A�Btk.�Btke.��A�Btke.�Btét.��A�Btét.�C++��A�C++�C.��A�C.�CSC.��A�CSC.�Cal.��A�Cal.�Cg.��A�Cg.�Cgf.��A�Cgf.�Cgt.��A�Cgt.�Cia.��A�Cia.�Co.��A�Co.�Colo.��A�Colo.�Comp.��A�Comp.�Copr.��A�Copr.�Corp.��A�Corp.�Cos.��A�Cos.�Cs.��A�Cs.�Csc.��A�Csc.�Csop.��A�Csop.�Cstv.��A�Cstv.�Ctv.��A�Ctv.�Ctvr.��A�Ctvr.�D.��A�D.�DR.��A�DR.�Dipl.��A�Dipl.�Dr.��A�Dr.�Dsz.��A�Dsz.�Dzs.��A�Dzs.�E.��A�E.�EK.��A�EK.�EU.��A�EU.�F.��A�F.�Fla.��A�Fla.�Folyt.��A�Folyt.�Fpk.��A�Fpk.�Főszerk.��A�Főszerk.�G.��A�G.�GK.��A�GK.�GM.��A�GM.�Gfv.��A�Gfv.�Gmk.��A�Gmk.�Gr.��A�Gr.�Group.��A�Group.�Gt.��A�Gt.�Gy.��A�Gy.�H.��A�H.�HKsz.��A�HKsz.�Hmvh.��A�Hmvh.�I.��A�I.�Ifj.��A�Ifj.�Inc.��A�Inc.�Inform.��A�Inform.�Int.��A�Int.�J.��A�J.�Jr.��A�Jr.�Jv.��A�Jv.�K.��A�K.�K.m.f.��A�K.m.f.�KB.��A�KB.�KER.��A�KER.�KFT.��A�KFT.�KRT.��A�KRT.�Kb.��A�Kb.�Ker.��A�Ker.�Kft.��A�Kft.�Kg.��A�Kg.�Kht.��A�Kht.�Kkt.��A�Kkt.�Kong.��A�Kong.�Korm.��A�Korm.�Kr.��A�Kr.�Kr.e.��A�Kr.e.�Kr.u.��A�Kr.u.�Krt.��A�Krt.�L.��A�L.�LB.��A�LB.�Llc.��A�Llc.�Ltd.��A�Ltd.�M.��A�M.�M.A.��A�M.A.�M.S.��A�M.S.�M.SC.��A�M.SC.�M.Sc.��A�M.Sc.�MA.��A�MA.�MH.��A�MH.�MSC.��A�MSC.�MSc.��A�MSc.�Mass.��A�Mass.�Max.��A�Max.�Mlle.��A�Mlle.�Mme.��A�Mme.�Mo.��A�Mo.�Mr.��A�Mr.�Mrs.��A�Mrs.�Ms.��A�Ms.�Mt.��A�Mt.�N.��A�N.�N.N.��A�N.N.�NB.��A�NB.�NBr.��A�NBr.�Nat.��A�Nat.�No.��A�No.�Nr.��A�Nr.�Ny.��A�Ny.�Nyh.��A�Nyh.�Nyr.��A�Nyr.�Nyrt.��A�Nyrt.�O.��A�O.�O.O��A�O.O�O.o��A�O.o�OJ.��A�OJ.�O_O��A�O_O�O_o��A�O_o�Op.��A�Op.�P.��A�P.�P.H.��A�P.H.�P.S.��A�P.S.�PH.D.��A�PH.D.�PHD.��A�PHD.�PROF.��A�PROF.�Pf.��A�Pf.�Ph.D��A�Ph.D�PhD.��A�PhD.�Pk.��A�Pk.�Pl.��A�Pl.�Plc.��A�Plc.�Pp.��A�Pp.�Proc.��A�Proc.�Prof.��A�Prof.�Ptk.��A�Ptk.�R.��A�R.�RT.��A�RT.�Rer.��A�Rer.�Rt.��A�Rt.�S.��A�S.�S.B.��A�S.B.�SZOLG.��A�SZOLG.�Salg.��A�Salg.�Sch.��A�Sch.�Spa.��A�Spa.�St.��A�St.�Sz.��A�Sz.�SzRt.��A�SzRt.�Szerk.��A�Szerk.�Szfv.��A�Szfv.�Szjt.��A�Szjt.�Szolg.��A�Szolg.�Szt.��A�Szt.�Sztv.��A�Sztv.�Szvt.��A�Szvt.�Számv.��A�Számv.�T.��A�T.�TEL.��A�TEL.�Tel.��A�Tel.�Ty.��A�Ty.�Tyr.��A�Tyr.�U.��A�U.�Ui.��A�Ui.�Ut.��A�Ut.�V.��A�V.�V.V��A�V.V�VB.��A�VB.�V_V��A�V_V�Vcs.��A�Vcs.�Vhr.��A�Vhr.�Vht.��A�Vht.�Várm.��A�Várm.�W.��A�W.�X.��A�X.�X.Y.��A�X.Y.�XD��A�XD�XDD��A�XDD�Y.��A�Y.�Z.��A�Z.�Zrt.��A�Zrt.�Zs.��A�Zs.�[-:��A�[-:�[:��A�[:�[=��A�[=�\")��A�\")�\n��A�\n�\t��A�\t�]=��A�]=�^_^��A�^_^�^__^��A�^__^�^___^��A�^___^�a.��A�a.�a.C.��A�a.C.�ac.��A�ac.�adj.��A�adj.�adm.��A�adm.�ag.��A�ag.�agit.��A�agit.�alez.��A�alez.�alk.��A�alk.�all.��A�all.�altbgy.��A�altbgy.�an.��A�an.�ang.��A�ang.�arch.��A�arch.�at.��A�at.�atc.��A�atc.�aug.��A�aug.�b.��A�b.�b.a.��A�b.a.�b.s.��A�b.s.�b.sc.��A�b.sc.�bek.��A�bek.�belker.��A�belker.�berend.��A�berend.�biz.��A�biz.�bizt.��A�bizt.�bo.��A�bo.�bp.��A�bp.�br.��A�br.�bsc.��A�bsc.�bt.��A�bt.�btk.��A�btk.�c.��A�c.�ca.��A�ca.�cc.��A�cc.�cca.��A�cca.�cf.��A�cf.�cif.��A�cif.�co.��A�co.�corp.��A�corp.�cos.��A�cos.�cs.��A�cs.�csc.��A�csc.�csüt.��A�csüt.�cső.��A�cső.�ctv.��A�ctv.�d.��A�d.�dbj.��A�dbj.�dd.��A�dd.�ddr.��A�ddr.�de.��A�de.�dec.��A�dec.�dikt.��A�dikt.�dipl.��A�dipl.�dj.��A�dj.�dk.��A�dk.�dl.��A�dl.�dny.��A�dny.�dolg.��A�dolg.�dr.��A�dr.�du.��A�du.�dzs.��A�dzs.�e.��A�e.�ea.��A�ea.�ed.��A�ed.�eff.��A�eff.�egyh.��A�egyh.�ell.��A�ell.�elv.��A�elv.�elvt.��A�elvt.�em.��A�em.�eng.��A�eng.�eny.��A�eny.�et.��A�et.�etc.��A�etc.�ev.��A�ev.�ezr.��A�ezr.�eü.��A�eü.�f.��A�f.�f.h.��A�f.h.�f.é.��A�f.é.�fam.��A�fam.�fb.��A�fb.�febr.��A�febr.�fej.��A�fej.�felv.��A�felv.�felügy.��A�felügy.�ff.��A�ff.�ffi.��A�ffi.�fhdgy.��A�fhdgy.�fil.��A�fil.�fiz.��A�fiz.�fm.��A�fm.�foglalk.��A�foglalk.�ford.��A�ford.�fp.��A�fp.�fr.��A�fr.�frsz.��A�frsz.�fszla.��A�fszla.�fszt.��A�fszt.�ft.��A�ft.�fuv.��A�fuv.�főig.��A�főig.�főisk.��A�főisk.�főtörm.��A�főtörm.�főv.��A�főv.�g.��A�g.�gazd.��A�gazd.�gimn.��A�gimn.�gk.��A�gk.�gkv.��A�gkv.�gmk.��A�gmk.�gondn.��A�gondn.�gr.��A�gr.�grav.��A�grav.�gy.��A�gy.�gyak.��A�gyak.�gyártm.��A�gyártm.�gör.��A�gör.�h.��A�h.�hads.��A�hads.�hallg.��A�hallg.�hdm.��A�hdm.�hdp.��A�hdp.�hds.��A�hds.�hg.��A�hg.�hiv.��A�hiv.�hk.��A�hk.�hm.��A�hm.�ho.��A�ho.�honv.��A�honv.�hp.��A�hp.�hr.��A�hr.�hrsz.��A�hrsz.�hsz.��A�hsz.�ht.��A�ht.�htb.��A�htb.�hv.��A�hv.�hőm.��A�hőm.�i.��A�i.�i.e.��A�i.e.�i.sz.��A�i.sz.�id.��A�id.�ie.��A�ie.�ifj.��A�ifj.�ig.��A�ig.�igh.��A�igh.�ill.��A�ill.�imp.��A�imp.�inc.��A�inc.�ind.��A�ind.�inform.��A�inform.�inic.��A�inic.�int.��A�int.�io.��A�io.�ip.��A�ip.�ir.��A�ir.�irod.��A�irod.�isk.��A�isk.�ism.��A�ism.�izr.��A�izr.�iá.��A�iá.�j.��A�j.�jan.��A�jan.�jav.��A�jav.�jegyz.��A�jegyz.�jgmk.��A�jgmk.�jjv.��A�jjv.�jkv.��A�jkv.�jogh.��A�jogh.�jogt.��A�jogt.�jr.��A�jr.�jvb.��A�jvb.�júl.��A�júl.�jún.��A�jún.�k.��A�k.�karb.��A�karb.�kat.��A�kat.�kath.��A�kath.�kb.��A�kb.�kcs.��A�kcs.�kd.��A�kd.�ker.��A�ker.�kf.��A�kf.�kft.��A�kft.�kht.��A�kht.�kir.��A�kir.�kirend.��A�kirend.�kisip.��A�kisip.�kiv.��A�kiv.�kk.��A�kk.�kkt.��A�kkt.�klin.��A�klin.�km.��A�km.�korm.��A�korm.�kp.��A�kp.�krt.��A�krt.�kt.��A�kt.�ktsg.��A�ktsg.�kult.��A�kult.�kv.��A�kv.�kve.��A�kve.�képv.��A�képv.�kísérl.��A�kísérl.�kóth.��A�kóth.�könyvt.��A�könyvt.�körz.��A�körz.�köv.��A�köv.�közj.��A�közj.�közl.��A�közl.�közp.��A�közp.�közt.��A�közt.�kü.��A�kü.�l.��A�l.�lat.��A�lat.�ld.��A�ld.�legs.��A�legs.�lg.��A�lg.�lgv.��A�lgv.�loc.��A�loc.�lt.��A�lt.�ltd.��A�ltd.�ltp.��A�ltp.�luth.��A�luth.�m.��A�m.�m.a.��A�m.a.�m.s.��A�m.s.�m.sc.��A�m.sc.�ma.��A�ma.�mat.��A�mat.�max.��A�max.�mb.��A�mb.�med.��A�med.�megh.��A�megh.�met.��A�met.�mf.��A�mf.�mfszt.��A�mfszt.�min.��A�min.�miss.��A�miss.�mjr.��A�mjr.�mjv.��A�mjv.�mk.��A�mk.�mlle.��A�mlle.�mme.��A�mme.�mn.��A�mn.�mozg.��A�mozg.�mr.��A�mr.�mrs.��A�mrs.�ms.��A�ms.�msc.��A�msc.�má.��A�má.�máj.��A�máj.�márc.��A�márc.�mé.��A�mé.�mélt.��A�mélt.�mü.��A�mü.�műh.��A�műh.�műsz.��A�műsz.�műv.��A�műv.�művez.��A�művez.�n.��A�n.�nagyker.��A�nagyker.�nagys.��A�nagys.�nat.��A�nat.�nb.��A�nb.�neg.��A�neg.�nk.��A�nk.�no.��A�no.�nov.��A�nov.�nu.��A�nu.�ny.��A�ny.�nyilv.��A�nyilv.�nyrt.��A�nyrt.�nyug.��A�nyug.�o.��A�o.�o.0��A�o.0�o.O��A�o.O�o.o��A�o.o�o_0��A�o_0�o_O��A�o_O�o_o��A�o_o�obj.��A�obj.�okl.��A�okl.�okt.��A�okt.�old.��A�old.�olv.��A�olv.�orsz.��A�orsz.�ort.��A�ort.�ov.��A�ov.�ovh.��A�ovh.�p.��A�p.�pf.��A�pf.�pg.��A�pg.�ph.d��A�ph.d�ph.d.��A�ph.d.�phd.��A�phd.�phil.��A�phil.�pjt.��A�pjt.�pk.��A�pk.�pl.��A�pl.�plb.��A�plb.�plc.��A�plc.�pld.��A�pld.�plur.��A�plur.�pol.��A�pol.�polg.��A�polg.�poz.��A�poz.�pp.��A�pp.�proc.��A�proc.�prof.��A�prof.�prot.��A�prot.�pság.��A�pság.�ptk.��A�ptk.�pu.��A�pu.�pü.��A�pü.�q.��A�q.�r.��A�r.�r.k.��A�r.k.�rac.��A�rac.�rad.��A�rad.�red.��A�red.�ref.��A�ref.�reg.��A�reg.�rer.��A�rer.�rev.��A�rev.�rf.��A�rf.�rkp.��A�rkp.�rkt.��A�rkt.�rt.��A�rt.�rtg.��A�rtg.�röv.��A�röv.�s.��A�s.�s.b.��A�s.b.�s.k.��A�s.k.�sa.��A�sa.�sb.��A�sb.�sel.��A�sel.�sgt.��A�sgt.�sm.��A�sm.�st.��A�st.�stat.��A�stat.�stb.��A�stb.�strat.��A�strat.�stud.��A�stud.�sz.��A�sz.�szakm.��A�szakm.�szaksz.��A�szaksz.�szakszerv.��A�szakszerv.�szd.��A�szd.�szds.��A�szds.�szept.��A�szept.�szerk.��A�szerk.�szf.��A�szf.�szimf.��A�szimf.�szjt.��A�szjt.�szkv.��A�szkv.�szla.��A�szla.�szn.��A�szn.�szolg.��A�szolg.�szt.��A�szt.�szubj.��A�szubj.�szöv.��A�szöv.�szül.��A�szül.�t.��A�t.�tanm.��A�tanm.�tb.��A�tb.�tbk.��A�tbk.�tc.��A�tc.�techn.��A�techn.�tek.��A�tek.�tel.��A�tel.�tf.��A�tf.�tgk.��A�tgk.�ti.��A�ti.�tip.��A�tip.�tisztv.��A�tisztv.�titks.��A�titks.�tk.��A�tk.�tkp.��A�tkp.�tny.��A�tny.�tp.��A�tp.�tszf.��A�tszf.�tszk.��A�tszk.�tszkv.��A�tszkv.�tv.��A�tv.�tvr.��A�tvr.�ty.��A�ty.�törv.��A�törv.�tü.��A�tü.�u.��A�u.�ua.��A�ua.�ui.��A�ui.�unit.��A�unit.�uo.��A�uo.�uv.��A�uv.�v.��A�v.�v.v��A�v.v�v_v��A�v_v�vas.��A�vas.�vb.��A�vb.�vegy.��A�vegy.�vh.��A�vh.�vhol.��A�vhol.�vhr.��A�vhr.�vill.��A�vill.�vizsg.��A�vizsg.�vk.��A�vk.�vkf.��A�vkf.�vkny.��A�vkny.�vm.��A�vm.�vol.��A�vol.�vs.��A�vs.�vsz.��A�vsz.�vv.��A�vv.�vál.��A�vál.�várm.��A�várm.�vízv.��A�vízv.�vö.��A�vö.�w.��A�w.�x.��A�x.�xD��A�xD�xDD��A�xDD�y.��A�y.�z.��A�z.�zrt.��A�zrt.�zs.��A�zs.� ��A� C� �¯\(ツ)/¯��A�¯\(ツ)/¯�°C��A�°C�°C.��A�°C�A�.�°F��A�°F�°F.��A�°F�A�.�°K��A�°K�°K.��A�°K�A�.�°c��A�°c�°c.��A�°c�A�.�°f��A�°f�°f.��A�°f�A�.�°k��A�°k�°k.��A�°k�A�.�Á.��A�Á.�Áe.��A�Áe.�Áht.��A�Áht.�É.��A�É.�Épt.��A�Épt.�Ész.��A�Ész.�Új-Z.��A�Új-Z.�ÚjZ.��A�ÚjZ.�Ún.��A�Ún.�á.��A�á.�ált.��A�ált.�ápr.��A�ápr.�ásv.��A�ásv.�ä.��A�ä.�é.��A�é.�ék.��A�ék.�ény.��A�ény.�érk.��A�érk.�évf.��A�évf.�í.��A�í.�ó.��A�ó.�ö.��A�ö.�össz.��A�össz.�ötk.��A�ötk.�özv.��A�özv.�ú.��A�ú.�ú.n.��A�ú.n.�úm.��A�úm.�ún.��A�ún.�út.��A�út.�ü.��A�ü.�üag.��A�üag.�üd.��A�üd.�üdv.��A�üdv.�üe.��A�üe.�ümk.��A�ümk.�ütk.��A�ütk.�üv.��A�üv.�őrgy.��A�őrgy.�őrpk.��A�őrpk.�őrv.��A�őrv.�ű.��A�ű.�ಠ_ಠ��A�ಠ_ಠ�ಠ︵ಠ��A�ಠ︵ಠ�—��A�—�’��A�’�’’��A�’’�faster_heuristics�
|
trainable_lemmatizer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 12356945
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e619d0c826a82e4acaf0b70ff34d5e8194e8404f5413320fbed4b83917e6b8b3
|
3 |
size 12356945
|
transformer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:73764adc8399fc4bd9a44c463254f704899ff6e5a5588fdd8a6f661fcfc61647
|
3 |
+
size 443344022
|
vocab/strings.json
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:470543f06fa1bae827076c99734ecd927ec185ccec73a2ecb6ff44c8a28b3b55
|
3 |
+
size 6399835
|