ginkgogo commited on
Commit
4442ca2
·
verified ·
1 Parent(s): efe2ccb

Add SetFit ABSA model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: setfit
3
+ tags:
4
+ - setfit
5
+ - absa
6
+ - sentence-transformers
7
+ - text-classification
8
+ - generated_from_setfit_trainer
9
+ metrics:
10
+ - accuracy
11
+ widget:
12
+ - text: 'Wifi:Go get your coffee on.
13
+
14
+ Hey the coffee was strong, so what else?
15
+
16
+ Easy FWY on/off access
17
+
18
+ Close the beach
19
+
20
+ Other food choice nearby
21
+
22
+ Smaller starbucks, with less seating both indoors and outdoors
23
+
24
+ Wifi was slow at the time'
25
+ - text: "place:Stopped by after a long day visit to Santa Barbara. There were few\
26
+ \ different places near by but remembering that 'smaller place with limited menu'\
27
+ \ has 70/30 chance of being better than a big place with big menu. It's generally\
28
+ \ a good ratio to keep in mind, and depending on the category of food, like sushi,\
29
+ \ it leans to a higher ratio like 80/20. \n\nOther places may have better seating\
30
+ \ and views, but I believe this place has better food. Their clam chowder was\
31
+ \ the best I've had so far in California (been here only for 5 months, I'm just\
32
+ \ getting started). Their 2 options of fresh oysters were good choices. Their\
33
+ \ lemonade was good and sweet. I just wished their shrimp cocktail came with more\
34
+ \ ...Cocktail. (Easy fix, just ask for more)"
35
+ - text: "evening:Three reasons why it gets three stars:\n\n1. The crab cakes were\
36
+ \ good and is a definitely must try!\n2. The shrimp scampi was actually amazing\
37
+ \ in the sauce that it comes with, so that's another must try!\n3. The real reason\
38
+ \ why it is getting three stars is because service is everything in ANY restaurant\
39
+ \ you go to. Service started off great, waitress was attentive, but once we paid\
40
+ \ the bill and left a 20% tip, my guests and I, which was only three of us, stayed\
41
+ \ at the table to finish our drinks and we're looking at funny videos from a trip\
42
+ \ we went to. Point is the waitress rudely told my friend to lower the volume\
43
+ \ on his phone, yet other guests were just as loud and we were sitting OUTSIDE...where\
44
+ \ it is already a loud environment! \n\nI really want to give it 4 stars, but\
45
+ \ if I give 4 stars it changes it to, \"Yay! I'm a fan\", but I am not. The only\
46
+ \ reason why it's not getting 1 star, is because the food was decent, the view\
47
+ \ is nice and also the manager was extremely empathetic to the situation and it\
48
+ \ wasn't her fault at all that her waitress was obviously having an off day. I\
49
+ \ have never met a manager that attentive and she was incredible at handling and\
50
+ \ diffusing the situation. I cannot thank her enough for salvaging the rest of\
51
+ \ our evening for how poor the waitress treated paying customers."
52
+ - text: Mediterranean:Pretty good food, just had a wrap and it was delicious pretty
53
+ much on Mediterranean or Greek style food around here. Petra's who had really
54
+ good Greek dinners closed
55
+ - text: sauce:The chicken made worth the waiting, my mild sauce was awesome, the honey
56
+ mustard my favorite
57
+ pipeline_tag: text-classification
58
+ inference: false
59
+ base_model: sentence-transformers/all-MiniLM-L6-v2
60
+ model-index:
61
+ - name: SetFit Aspect Model with sentence-transformers/all-MiniLM-L6-v2
62
+ results:
63
+ - task:
64
+ type: text-classification
65
+ name: Text Classification
66
+ dataset:
67
+ name: Unknown
68
+ type: unknown
69
+ split: test
70
+ metrics:
71
+ - type: accuracy
72
+ value: 0.9602649006622517
73
+ name: Accuracy
74
+ ---
75
+
76
+ # SetFit Aspect Model with sentence-transformers/all-MiniLM-L6-v2
77
+
78
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Aspect Based Sentiment Analysis (ABSA). This SetFit model uses [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification. In particular, this model is in charge of filtering aspect span candidates.
79
+
80
+ The model has been trained using an efficient few-shot learning technique that involves:
81
+
82
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
83
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
84
+
85
+ This model was trained within the context of a larger system for ABSA, which looks like so:
86
+
87
+ 1. Use a spaCy model to select possible aspect span candidates.
88
+ 2. **Use this SetFit model to filter these possible aspect span candidates.**
89
+ 3. Use a SetFit model to classify the filtered aspect span candidates.
90
+
91
+ ## Model Details
92
+
93
+ ### Model Description
94
+ - **Model Type:** SetFit
95
+ - **Sentence Transformer body:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
96
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
97
+ - **spaCy Model:** en_core_web_sm
98
+ - **SetFitABSA Aspect Model:** [ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-aspect](https://huggingface.co/ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-aspect)
99
+ - **SetFitABSA Polarity Model:** [ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-polarity](https://huggingface.co/ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-polarity)
100
+ - **Maximum Sequence Length:** 256 tokens
101
+ - **Number of Classes:** 2 classes
102
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
103
+ <!-- - **Language:** Unknown -->
104
+ <!-- - **License:** Unknown -->
105
+
106
+ ### Model Sources
107
+
108
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
109
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
110
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
111
+
112
+ ### Model Labels
113
+ | Label | Examples |
114
+ |:----------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
115
+ | aspect | <ul><li>'food:They made it into more American food, added burgers and ribs and got rid of the tequila selection. We were so bummed. Used to be one of our favorite places to go for good Mexican food. The owner said the new direction was to appeal to more tourists.'</li><li>"seating:Such a cute little spot for desserts! I'm so glad we had time on our short visit to Santa Barbara to grab a slice of cake from here. My husband and I each got our own to slice to share of course. He said we didn't come all this way just to get one so we chose a slice of the berry cake and chocolate decadence. The berry cake was nice and fluffy without being too sweet. The acidity from the fruits balanced the sweetest of the cake wonderfully. If you're up for something rich then the chocolate decadence will not disappoint. Service was great and seating was comfortable. Order your sweet treats at the counter then a number will be given to you. Pick a table and get ready to enjoy because your sweets will be brought out to your table when ready."</li><li>'food:One brisk Saturday morning after asking workers during a stop for tylenol from the Hotel California Boutique the best breakfast place, they recommended Goat Tree. We crossed the busy street and greeted the hostess. The very kind young lady walked us to our table on the sunny patio. We skimmed the menu and decided on the chicken and waffle and a chocolate croissant. The wait was quite short and we spent it discussing the beautiful surrounding area. Soon, our food was delivered, and let me tell you, it was beautiful. On top of that, it was scrumptious. The fried chicken was perfect and tender. The waffle had the perfect balance of crunch and fluff. And how dare I forget the exquisite honey. Now this honey was the best I have ever tasted. It was topped with chia and pumpkin seeds. My daughter asked for her croissant warmed, and once again it was marvelous. After paying, I told our waitress how amazing the honey was. Next thing we knew, she brought out two large to go cups full of it! \n\nAbsolutely loved this place and everything about it. 100% recommend! I strongly award them 5 stars!'</li></ul> |
116
+ | no aspect | <ul><li>'burgers:They made it into more American food, added burgers and ribs and got rid of the tequila selection. We were so bummed. Used to be one of our favorite places to go for good Mexican food. The owner said the new direction was to appeal to more tourists.'</li><li>'ribs:They made it into more American food, added burgers and ribs and got rid of the tequila selection. We were so bummed. Used to be one of our favorite places to go for good Mexican food. The owner said the new direction was to appeal to more tourists.'</li><li>'tequila selection:They made it into more American food, added burgers and ribs and got rid of the tequila selection. We were so bummed. Used to be one of our favorite places to go for good Mexican food. The owner said the new direction was to appeal to more tourists.'</li></ul> |
117
+
118
+ ## Evaluation
119
+
120
+ ### Metrics
121
+ | Label | Accuracy |
122
+ |:--------|:---------|
123
+ | **all** | 0.9603 |
124
+
125
+ ## Uses
126
+
127
+ ### Direct Use for Inference
128
+
129
+ First install the SetFit library:
130
+
131
+ ```bash
132
+ pip install setfit
133
+ ```
134
+
135
+ Then you can load this model and run inference.
136
+
137
+ ```python
138
+ from setfit import AbsaModel
139
+
140
+ # Download from the 🤗 Hub
141
+ model = AbsaModel.from_pretrained(
142
+ "ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
143
+ "ginkgogo/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
144
+ )
145
+ # Run inference
146
+ preds = model("The food was great, but the venue is just way too busy.")
147
+ ```
148
+
149
+ <!--
150
+ ### Downstream Use
151
+
152
+ *List how someone could finetune this model on their own dataset.*
153
+ -->
154
+
155
+ <!--
156
+ ### Out-of-Scope Use
157
+
158
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
159
+ -->
160
+
161
+ <!--
162
+ ## Bias, Risks and Limitations
163
+
164
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
165
+ -->
166
+
167
+ <!--
168
+ ### Recommendations
169
+
170
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
171
+ -->
172
+
173
+ ## Training Details
174
+
175
+ ### Training Set Metrics
176
+ | Training set | Min | Median | Max |
177
+ |:-------------|:----|:---------|:----|
178
+ | Word count | 21 | 200.4733 | 491 |
179
+
180
+ | Label | Training Sample Count |
181
+ |:----------|:----------------------|
182
+ | no aspect | 411 |
183
+ | aspect | 20 |
184
+
185
+ ### Training Hyperparameters
186
+ - batch_size: (50, 50)
187
+ - num_epochs: (5, 5)
188
+ - max_steps: -1
189
+ - sampling_strategy: oversampling
190
+ - body_learning_rate: (2e-05, 1e-05)
191
+ - head_learning_rate: 0.01
192
+ - loss: CosineSimilarityLoss
193
+ - distance_metric: cosine_distance
194
+ - margin: 0.25
195
+ - end_to_end: False
196
+ - use_amp: True
197
+ - warmup_proportion: 0.1
198
+ - seed: 42
199
+ - eval_max_steps: -1
200
+ - load_best_model_at_end: True
201
+
202
+ ### Training Results
203
+ | Epoch | Step | Training Loss | Validation Loss |
204
+ |:----------:|:-------:|:-------------:|:---------------:|
205
+ | 0.0003 | 1 | 0.2706 | - |
206
+ | 0.0147 | 50 | 0.2856 | 0.3049 |
207
+ | 0.0294 | 100 | 0.2817 | 0.2904 |
208
+ | 0.0442 | 150 | 0.2453 | 0.2837 |
209
+ | 0.0589 | 200 | 0.2637 | 0.2756 |
210
+ | 0.0736 | 250 | 0.199 | 0.2668 |
211
+ | 0.0883 | 300 | 0.1917 | 0.2523 |
212
+ | 0.1031 | 350 | 0.1071 | 0.1889 |
213
+ | 0.1178 | 400 | 0.049 | 0.0826 |
214
+ | **0.1325** | **450** | **0.022** | **0.0718** |
215
+ | 0.1472 | 500 | 0.0275 | 0.0767 |
216
+ | 0.1620 | 550 | 0.0152 | 0.0779 |
217
+ | 0.1767 | 600 | 0.0185 | 0.0905 |
218
+ | 0.1914 | 650 | 0.0044 | 0.0785 |
219
+ | 0.2061 | 700 | 0.008 | 0.0896 |
220
+
221
+ * The bold row denotes the saved checkpoint.
222
+ ### Framework Versions
223
+ - Python: 3.10.12
224
+ - SetFit: 1.0.3
225
+ - Sentence Transformers: 2.6.0
226
+ - spaCy: 3.7.4
227
+ - Transformers: 4.39.1
228
+ - PyTorch: 2.2.1+cu121
229
+ - Datasets: 2.18.0
230
+ - Tokenizers: 0.15.2
231
+
232
+ ## Citation
233
+
234
+ ### BibTeX
235
+ ```bibtex
236
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
237
+ doi = {10.48550/ARXIV.2209.11055},
238
+ url = {https://arxiv.org/abs/2209.11055},
239
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
240
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
241
+ title = {Efficient Few-Shot Learning Without Prompts},
242
+ publisher = {arXiv},
243
+ year = {2022},
244
+ copyright = {Creative Commons Attribution 4.0 International}
245
+ }
246
+ ```
247
+
248
+ <!--
249
+ ## Glossary
250
+
251
+ *Clearly define terms in order to be accessible across audiences.*
252
+ -->
253
+
254
+ <!--
255
+ ## Model Card Authors
256
+
257
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
258
+ -->
259
+
260
+ <!--
261
+ ## Model Card Contact
262
+
263
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
264
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "models/step_450",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.39.1",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.0.0",
4
+ "transformers": "4.6.1",
5
+ "pytorch": "1.8.1"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
config_setfit.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "span_context": 0,
3
+ "spacy_model": "en_core_web_sm",
4
+ "normalize_embeddings": false,
5
+ "labels": [
6
+ "no aspect",
7
+ "aspect"
8
+ ]
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9d06919a73b192295c0113d0b1353a8d70beba101c5c6668307cc8ea0ee22d6
3
+ size 90864192
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9068d01421488717eeca7e48762fa2dcb717a2d9dd6b4be4e3851edf3c600a96
3
+ size 3919
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff