selina09 commited on
Commit
a55eba8
·
verified ·
1 Parent(s): 99530da

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-small-en-v1.5
3
+ library_name: setfit
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - setfit
9
+ - sentence-transformers
10
+ - text-classification
11
+ - generated_from_setfit_trainer
12
+ widget:
13
+ - text: dont trust it
14
+ - text: 'works and our AV guys love it people show up with laptops and need to connect
15
+ plus you can have a secondary monitor as an output we use it for PowerPoint '
16
+ - text: 'I have used Quicken since Microsoft abandoned MSMoney On a Windows PC Sick
17
+ of the PC crashing freezing fluttering and otherwise giving me the finger I bought
18
+ a MAC No freezing crashing or security issues Even runs most PC software But not
19
+ Quicken Just something called Quicken Essentials made for people who don t bank
20
+ on line don t invest don t have options or IRAs or k accounts In other words made
21
+ for the folk who buy Lotus for Dummies So I make do with a PC Laptop for accounting
22
+ using the LAN of my MAC to download and have on it Turbotax as well all the while
23
+ cursing the Intuit penchant for outdated technology '
24
+ - text: I gave this a this year because the CD just plain flat out didn t work I tried
25
+ mutliple PCs all with the same resul Please insert a CD Dummy me didn t try the
26
+ CD until the day return policy had expired so there was no way to return it for
27
+ a refund I called Intuit and luckily they provided me with a downloadable copy
28
+ via their site Intuit seemed pretty aware of the problem as they didn t even request
29
+ the CD be sent to them I should get a refund for all the hassle I went through
30
+ ha ha
31
+ - text: 'I love TurboTax We use it to prepare our household taxes every year There
32
+ is a table on the back of every box to help you pick which version you need It
33
+ has been accurate in my experience When I was young I could get by with a EZ which
34
+ is equivalent to TurboTax s free software As my career progressed I graduated
35
+ to TurboTax Basic When I married our combined assets bumped us into Deluxe and
36
+ then Premier We don t own a business so we may never need Home Business Prior
37
+ to this I had never revisited Basic I was curious to experience how much I was
38
+ gaining from using Premier Without going into too much detail the difference is
39
+ night and day I think they sit too far apart in the gamut for an honest comparison
40
+ like comparing a Corolla to an Avalon But it is clear that our family will never
41
+ get by with Basic Thankfully this was provided to me free of charge under the
42
+ Vine program but otherwise it would have been wasted I ll stick with Premier BOTTOM
43
+ LINE TurboTax is wonderful but you should follow the advice on the back of the
44
+ box Don t skimp Buy the version that s right for you Don t be intimidated by the
45
+ cost You can write off the cost of the software as Tax Prep '
46
+ inference: true
47
+ ---
48
+
49
+ # SetFit with BAAI/bge-small-en-v1.5
50
+
51
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
52
+
53
+ The model has been trained using an efficient few-shot learning technique that involves:
54
+
55
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
56
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
57
+
58
+ ## Model Details
59
+
60
+ ### Model Description
61
+ - **Model Type:** SetFit
62
+ - **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
63
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
64
+ - **Maximum Sequence Length:** 512 tokens
65
+ - **Number of Classes:** 2 classes
66
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
67
+ <!-- - **Language:** Unknown -->
68
+ <!-- - **License:** Unknown -->
69
+
70
+ ### Model Sources
71
+
72
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
73
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
74
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
75
+
76
+ ### Model Labels
77
+ | Label | Examples |
78
+ |:------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
79
+ | 0 | <ul><li>'Been using this excellent product for years don t ever try and do income taxes without it '</li><li>'Use kaspersky every year best product around Will use no other product best prosit I have seen on the market'</li><li>'I ve used Norton before and various free anti virus and with a professional version you get a more comprehensive set of security options that quietly takes care of business in the back ground There is a peace of mind factor that a professional version gives you and for the less than tech savvy it s a bit more idiot proof than a bare bones free ware I have no problem with free ware as my computing needs are pretty simple but a pro version is very nice and this is pretty cheap for the year long comfort of install it and then pretty much forget about it security I got this current product via the Vine but I have bought the professional Norton for the two years running previously when it has been on sale I have multiple computers so the license is handy and I do tend to use all three For the most part Norton is comfortable and user friendly especially if you aren t overly expert with using software '</li></ul> |
80
+ | 1 | <ul><li>'I have use Quicken for over years and I can t believe how cumbersome and poorly conceived this version is compared to past versions The main page is useless and you now have to open multiple windows to get the information you need then you have to close all the windows you opened to get to the next account When looking at a performance page of your investment accounts you get a pie chart instead of a bar graph What good is a pie chart when you are looking at performance data over a specific time range I thought the purpose of newer versions was to improve the existing version and not regress If Microsoft still had a financial program I would be forced to migrate to another program Intuit needs to change it s company name because this program is not intuitive It is ill conceived and makes for a frustrating experience '</li><li>'Would not install activation code not accepted Returned it '</li><li>'I installed this over Norton which I have used and had no problems with My computer slowed to a crawl NAV ate all my computer s resources Activation is a problem and so is its updating proceedures I uninstalled it after it just plain was not working There are still remnents of it on my machine that will not go away I bought Zone Alarm Security Suite ZA Suite is great uses very little resources and my computer is now speedy again Norton is totally overgrown and needs to be rewritten from the source code I will never use a Norton Product again '</li></ul> |
81
+
82
+ ## Uses
83
+
84
+ ### Direct Use for Inference
85
+
86
+ First install the SetFit library:
87
+
88
+ ```bash
89
+ pip install setfit
90
+ ```
91
+
92
+ Then you can load this model and run inference.
93
+
94
+ ```python
95
+ from setfit import SetFitModel
96
+
97
+ # Download from the 🤗 Hub
98
+ model = SetFitModel.from_pretrained("selina09/yt_setfit2")
99
+ # Run inference
100
+ preds = model("dont trust it")
101
+ ```
102
+
103
+ <!--
104
+ ### Downstream Use
105
+
106
+ *List how someone could finetune this model on their own dataset.*
107
+ -->
108
+
109
+ <!--
110
+ ### Out-of-Scope Use
111
+
112
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
113
+ -->
114
+
115
+ <!--
116
+ ## Bias, Risks and Limitations
117
+
118
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
119
+ -->
120
+
121
+ <!--
122
+ ### Recommendations
123
+
124
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
125
+ -->
126
+
127
+ ## Training Details
128
+
129
+ ### Training Set Metrics
130
+ | Training set | Min | Median | Max |
131
+ |:-------------|:----|:--------|:----|
132
+ | Word count | 1 | 93.9133 | 364 |
133
+
134
+ | Label | Training Sample Count |
135
+ |:------|:----------------------|
136
+ | 0 | 75 |
137
+ | 1 | 75 |
138
+
139
+ ### Training Hyperparameters
140
+ - batch_size: (32, 32)
141
+ - num_epochs: (10, 10)
142
+ - max_steps: -1
143
+ - sampling_strategy: oversampling
144
+ - body_learning_rate: (2e-05, 1e-05)
145
+ - head_learning_rate: 0.01
146
+ - loss: CosineSimilarityLoss
147
+ - distance_metric: cosine_distance
148
+ - margin: 0.25
149
+ - end_to_end: False
150
+ - use_amp: False
151
+ - warmup_proportion: 0.1
152
+ - seed: 42
153
+ - eval_max_steps: -1
154
+ - load_best_model_at_end: False
155
+
156
+ ### Training Results
157
+ | Epoch | Step | Training Loss | Validation Loss |
158
+ |:------:|:----:|:-------------:|:---------------:|
159
+ | 0.0028 | 1 | 0.2613 | - |
160
+ | 0.1401 | 50 | 0.239 | - |
161
+ | 0.2801 | 100 | 0.2175 | - |
162
+ | 0.4202 | 150 | 0.2015 | - |
163
+ | 0.5602 | 200 | 0.0628 | - |
164
+ | 0.7003 | 250 | 0.0534 | - |
165
+ | 0.8403 | 300 | 0.0163 | - |
166
+ | 0.9804 | 350 | 0.0105 | - |
167
+ | 1.1204 | 400 | 0.0259 | - |
168
+ | 1.2605 | 450 | 0.0024 | - |
169
+ | 1.4006 | 500 | 0.0013 | - |
170
+ | 1.5406 | 550 | 0.0196 | - |
171
+ | 1.6807 | 600 | 0.0157 | - |
172
+ | 1.8207 | 650 | 0.0184 | - |
173
+ | 1.9608 | 700 | 0.0159 | - |
174
+ | 2.1008 | 750 | 0.0062 | - |
175
+ | 2.2409 | 800 | 0.0179 | - |
176
+ | 2.3810 | 850 | 0.0165 | - |
177
+ | 2.5210 | 900 | 0.0092 | - |
178
+ | 2.6611 | 950 | 0.0299 | - |
179
+ | 2.8011 | 1000 | 0.0071 | - |
180
+ | 2.9412 | 1050 | 0.0115 | - |
181
+ | 3.0812 | 1100 | 0.0007 | - |
182
+ | 3.2213 | 1150 | 0.0248 | - |
183
+ | 3.3613 | 1200 | 0.0007 | - |
184
+ | 3.5014 | 1250 | 0.0096 | - |
185
+ | 3.6415 | 1300 | 0.0091 | - |
186
+ | 3.7815 | 1350 | 0.0007 | - |
187
+ | 3.9216 | 1400 | 0.0255 | - |
188
+ | 4.0616 | 1450 | 0.0065 | - |
189
+ | 4.2017 | 1500 | 0.0178 | - |
190
+ | 4.3417 | 1550 | 0.0168 | - |
191
+ | 4.4818 | 1600 | 0.0161 | - |
192
+ | 4.6218 | 1650 | 0.0093 | - |
193
+ | 4.7619 | 1700 | 0.0337 | - |
194
+ | 4.9020 | 1750 | 0.0148 | - |
195
+ | 5.0420 | 1800 | 0.0082 | - |
196
+ | 5.1821 | 1850 | 0.023 | - |
197
+ | 5.3221 | 1900 | 0.0185 | - |
198
+ | 5.4622 | 1950 | 0.0155 | - |
199
+ | 5.6022 | 2000 | 0.0176 | - |
200
+ | 5.7423 | 2050 | 0.0004 | - |
201
+ | 5.8824 | 2100 | 0.0221 | - |
202
+ | 6.0224 | 2150 | 0.0004 | - |
203
+ | 6.1625 | 2200 | 0.0045 | - |
204
+ | 6.3025 | 2250 | 0.0004 | - |
205
+ | 6.4426 | 2300 | 0.0081 | - |
206
+ | 6.5826 | 2350 | 0.0089 | - |
207
+ | 6.7227 | 2400 | 0.0091 | - |
208
+ | 6.8627 | 2450 | 0.0004 | - |
209
+ | 7.0028 | 2500 | 0.0238 | - |
210
+ | 7.1429 | 2550 | 0.0056 | - |
211
+ | 7.2829 | 2600 | 0.0175 | - |
212
+ | 7.4230 | 2650 | 0.0088 | - |
213
+ | 7.5630 | 2700 | 0.0383 | - |
214
+ | 7.7031 | 2750 | 0.0356 | - |
215
+ | 7.8431 | 2800 | 0.0004 | - |
216
+ | 7.9832 | 2850 | 0.0231 | - |
217
+ | 8.1232 | 2900 | 0.0292 | - |
218
+ | 8.2633 | 2950 | 0.0384 | - |
219
+ | 8.4034 | 3000 | 0.0004 | - |
220
+ | 8.5434 | 3050 | 0.0091 | - |
221
+ | 8.6835 | 3100 | 0.0079 | - |
222
+ | 8.8235 | 3150 | 0.0298 | - |
223
+ | 8.9636 | 3200 | 0.0083 | - |
224
+ | 9.1036 | 3250 | 0.0004 | - |
225
+ | 9.2437 | 3300 | 0.0003 | - |
226
+ | 9.3838 | 3350 | 0.0312 | - |
227
+ | 9.5238 | 3400 | 0.0157 | - |
228
+ | 9.6639 | 3450 | 0.0003 | - |
229
+ | 9.8039 | 3500 | 0.0306 | - |
230
+ | 9.9440 | 3550 | 0.0084 | - |
231
+
232
+ ### Framework Versions
233
+ - Python: 3.10.12
234
+ - SetFit: 1.0.3
235
+ - Sentence Transformers: 3.0.1
236
+ - Transformers: 4.40.2
237
+ - PyTorch: 2.4.0+cu121
238
+ - Datasets: 2.21.0
239
+ - Tokenizers: 0.19.1
240
+
241
+ ## Citation
242
+
243
+ ### BibTeX
244
+ ```bibtex
245
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
246
+ doi = {10.48550/ARXIV.2209.11055},
247
+ url = {https://arxiv.org/abs/2209.11055},
248
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
249
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
250
+ title = {Efficient Few-Shot Learning Without Prompts},
251
+ publisher = {arXiv},
252
+ year = {2022},
253
+ copyright = {Creative Commons Attribution 4.0 International}
254
+ }
255
+ ```
256
+
257
+ <!--
258
+ ## Glossary
259
+
260
+ *Clearly define terms in order to be accessible across audiences.*
261
+ -->
262
+
263
+ <!--
264
+ ## Model Card Authors
265
+
266
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
267
+ -->
268
+
269
+ <!--
270
+ ## Model Card Contact
271
+
272
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
273
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-small-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.40.2",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.40.2",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad3e4e25e365a29a87a9bf8ca9072c0eec4e7015f05e6bcd1d4d7cf90bb2fc57
3
+ size 133462128
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8dfd15cfd4622f2940839f128306c8bde66a365633aef9151119e6cc2c9ffc9
3
+ size 3935
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff