stephenhib commited on
Commit
f1ea1c2
·
verified ·
1 Parent(s): b09b696

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,536 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy@1
6
+ - cosine_accuracy@3
7
+ - cosine_accuracy@5
8
+ - cosine_accuracy@10
9
+ - cosine_precision@1
10
+ - cosine_precision@3
11
+ - cosine_precision@5
12
+ - cosine_precision@10
13
+ - cosine_recall@1
14
+ - cosine_recall@3
15
+ - cosine_recall@5
16
+ - cosine_recall@10
17
+ - cosine_ndcg@10
18
+ - cosine_mrr@10
19
+ - cosine_map@100
20
+ pipeline_tag: sentence-similarity
21
+ tags:
22
+ - sentence-transformers
23
+ - sentence-similarity
24
+ - feature-extraction
25
+ - generated_from_trainer
26
+ - dataset_size:807656
27
+ - loss:MultipleNegativesRankingLoss
28
+ widget:
29
+ - source_sentence: '<p id="pa01" num="0001">An decoding method according to an embodiment
30
+ includes a deriving step and an decoding step. The deriving step derives a first
31
+ reference value that is a reference value of a weighting factor based on fixed
32
+ point precision representing roughness of the weighting factor that is used for
33
+ making a motion-compensated prediction of a change in a pixel value by multiplying
34
+ a reference image by the weighting factor. The decoding step decodes a first difference
35
+ value that is a difference value between the weighting factor and the first reference
36
+ value. The weighting factor is included in a range of predetermined bit precision
37
+ having the first reference value at approximate center.
38
+
39
+ <img id="iaf01" file="imgaf001.tif" wi="146" he="85" img-content="drawing" img-format="tif"/></p>'
40
+ sentences:
41
+ - DECODING METHOD AND DECODING DEVICE
42
+ - METHOD FOR DETERMINING SEMI-SYNCHRONOUS EXPOSURE PARAMETERS AND ELECTRONIC DEVICE
43
+ - HOISTING ROPE MONITORING DEVICE
44
+ - source_sentence: <p id="pa01" num="0001">A layered sheet 10 includes a substrate
45
+ layer 1, and surface layers 2 and 3 configured to be layered on at least one surface
46
+ of the substrate layer 1. The substrate layer 1 contains a first thermoplastic
47
+ resin and inorganic fillers. The surface layers 2 and 3 contain a second thermoplastic
48
+ resin and a conductive material. A content of the inorganic fillers in the substrate
49
+ layer 1 is 0.3 to 28 mass% based on a total amount of the substrate layer.<img
50
+ id="iaf01" file="imgaf001.tif" wi="86" he="70" img-content="drawing" img-format="tif"/><img
51
+ id="iaf02" file="imgaf002.tif" wi="165" he="117" img-content="drawing" img-format="tif"/></p>
52
+ sentences:
53
+ - LAYERED SHEET, CONTAINER, CARRIER TAPE, AND ELECTRONIC COMPONENT PACKAGING BODY
54
+ - BLOCK COPOLYMERS FOR GEL COMPOSITIONS WITH IMPROVED EFFICIENCY
55
+ - AN INDICATOR SYSTEM FOR A PERISHABLE PRODUCT CONTAINER
56
+ - source_sentence: '<p id="pa01" num="0001">A method for manufacturing a gear which
57
+ effectively prevent a crack from occurring inside a tooth part when rolling processing
58
+ is performed on a teeth part of a gear raw material is achieved. A method according
59
+ to one embodiment for manufacturing a gear 15 by performing rolling processing
60
+ on a tooth part 2a of a sintered gear raw material 2. The method includes, when
61
+ the rolling processing is performed on the tooth part 2a of the gear raw material
62
+ 2, pressing the gear raw material 2 toward a center of rotation of the gear raw
63
+ material 2 by a rolling machine 4 and, when at least the rolling processing is
64
+ performed on the tooth part 2a of the gear raw material 2 toward a center of a
65
+ thickness thereof by a pressing machine 5, pressing a region A where an internal
66
+ density of the tooth part 2a of the gear raw material 2 decreases.</p><p id="pa02"
67
+ num="0002">The invention also relates to an apparatus for manufacturing a gear.
68
+
69
+ <img id="iaf01" file="imgaf001.tif" wi="106" he="68" img-content="drawing" img-format="tif"/></p>'
70
+ sentences:
71
+ - COMMUNICATION METHOD, RELATED APPARATUS AND DEVICE AND COMPUTER-READABLE STORAGE
72
+ MEDIUM
73
+ - METHOD AND APPARATUS FOR MANUFACTURING GEAR
74
+ - IMPLANTABLE MEDICAL DEVICE AND METHOD OF PROVIDING WIRE CONNECTIONS FOR IT
75
+ - source_sentence: '<p id="pa01" num="0001">This application discloses a data reading
76
+ method, apparatus, and system, and a distributed system, and belongs to the field
77
+ of storage technologies. The method includes: receiving a data read request sent
78
+ by a terminal, where the data read request includes a logical address of target
79
+ data; locally searching, based on the logical address, a first slave node for
80
+ a latest version of the target data; and when it is determined that the latest
81
+ version of the target data has been stored in each of a plurality of slave nodes,
82
+ sending the latest version of the target data to the terminal. This application
83
+ can avoid a rollback of a version of read data, and this application applies to
84
+ data reading.<img id="iaf01" file="imgaf001.tif" wi="62" he="86" img-content="drawing"
85
+ img-format="tif"/><img id="iaf02" file="imgaf002.tif" wi="155" he="233" img-content="drawing"
86
+ img-format="tif"/></p>'
87
+ sentences:
88
+ - SLIDING MECHANISM AND TERMINAL DEVICE PROVIDED WITH SAME
89
+ - PRESSURE-APPLYING DEVICE FOR A SWITCHING MODULE AND METHOD OF CHANGING A SWITCHING
90
+ MODULE USING THE SAME
91
+ - DATA READING METHOD, DEVICE, SYSTEM, AND DISTRIBUTED SYSTEM
92
+ - source_sentence: '<p id="pa01" num="0001">An application apparatus (100) includes:
93
+ an application needle (24) that applies, to a target, an application material
94
+ having its viscosity changing under shear; a drive unit (90) that moves the application
95
+ needle (24) up and down; and a controller (80) that controls the drive unit (90)
96
+ to move the application needle such that shear is applied to the application material
97
+ at a shear speed depending on a type of the application material and depending
98
+ on a target application amount or a target application diameter.<img id="iaf01"
99
+ file="imgaf001.tif" wi="78" he="56" img-content="drawing" img-format="tif"/></p>'
100
+ sentences:
101
+ - HEAT PROCESSING DEVICE
102
+ - Electric motor
103
+ - COATING APPARATUS AND COATING METHOD
104
+ model-index:
105
+ - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
106
+ results:
107
+ - task:
108
+ type: information-retrieval
109
+ name: Information Retrieval
110
+ dataset:
111
+ name: sentence transformers/all mpnet base v2
112
+ type: sentence-transformers/all-mpnet-base-v2
113
+ metrics:
114
+ - type: cosine_accuracy@1
115
+ value: 0.592
116
+ name: Cosine Accuracy@1
117
+ - type: cosine_accuracy@3
118
+ value: 0.711
119
+ name: Cosine Accuracy@3
120
+ - type: cosine_accuracy@5
121
+ value: 0.751
122
+ name: Cosine Accuracy@5
123
+ - type: cosine_accuracy@10
124
+ value: 0.814
125
+ name: Cosine Accuracy@10
126
+ - type: cosine_precision@1
127
+ value: 0.592
128
+ name: Cosine Precision@1
129
+ - type: cosine_precision@3
130
+ value: 0.237
131
+ name: Cosine Precision@3
132
+ - type: cosine_precision@5
133
+ value: 0.1502
134
+ name: Cosine Precision@5
135
+ - type: cosine_precision@10
136
+ value: 0.0814
137
+ name: Cosine Precision@10
138
+ - type: cosine_recall@1
139
+ value: 0.592
140
+ name: Cosine Recall@1
141
+ - type: cosine_recall@3
142
+ value: 0.711
143
+ name: Cosine Recall@3
144
+ - type: cosine_recall@5
145
+ value: 0.751
146
+ name: Cosine Recall@5
147
+ - type: cosine_recall@10
148
+ value: 0.814
149
+ name: Cosine Recall@10
150
+ - type: cosine_ndcg@10
151
+ value: 0.6987639783179386
152
+ name: Cosine Ndcg@10
153
+ - type: cosine_mrr@10
154
+ value: 0.6624964285714287
155
+ name: Cosine Mrr@10
156
+ - type: cosine_map@100
157
+ value: 0.6665468875517868
158
+ name: Cosine Map@100
159
+ ---
160
+
161
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
162
+
163
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
164
+
165
+ ## Model Details
166
+
167
+ ### Model Description
168
+ - **Model Type:** Sentence Transformer
169
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision f1b1b820e405bb8644f5e8d9a3b98f9c9e0a3c58 -->
170
+ - **Maximum Sequence Length:** 384 tokens
171
+ - **Output Dimensionality:** 768 tokens
172
+ - **Similarity Function:** Cosine Similarity
173
+ - **Training Dataset:**
174
+ - json
175
+ <!-- - **Language:** Unknown -->
176
+ <!-- - **License:** Unknown -->
177
+
178
+ ### Model Sources
179
+
180
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
181
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
182
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
183
+
184
+ ### Full Model Architecture
185
+
186
+ ```
187
+ SentenceTransformer(
188
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
189
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
190
+ (2): Normalize()
191
+ )
192
+ ```
193
+
194
+ ## Usage
195
+
196
+ ### Direct Usage (Sentence Transformers)
197
+
198
+ First install the Sentence Transformers library:
199
+
200
+ ```bash
201
+ pip install -U sentence-transformers
202
+ ```
203
+
204
+ Then you can load this model and run inference.
205
+ ```python
206
+ from sentence_transformers import SentenceTransformer
207
+
208
+ # Download from the 🤗 Hub
209
+ model = SentenceTransformer("stephenhib/all-mpnet-base-v2-patabs-1epoc-batch32-100000")
210
+ # Run inference
211
+ sentences = [
212
+ '<p id="pa01" num="0001">An application apparatus (100) includes: an application needle (24) that applies, to a target, an application material having its viscosity changing under shear; a drive unit (90) that moves the application needle (24) up and down; and a controller (80) that controls the drive unit (90) to move the application needle such that shear is applied to the application material at a shear speed depending on a type of the application material and depending on a target application amount or a target application diameter.<img id="iaf01" file="imgaf001.tif" wi="78" he="56" img-content="drawing" img-format="tif"/></p>',
213
+ 'COATING APPARATUS AND COATING METHOD',
214
+ 'Electric motor',
215
+ ]
216
+ embeddings = model.encode(sentences)
217
+ print(embeddings.shape)
218
+ # [3, 768]
219
+
220
+ # Get the similarity scores for the embeddings
221
+ similarities = model.similarity(embeddings, embeddings)
222
+ print(similarities.shape)
223
+ # [3, 3]
224
+ ```
225
+
226
+ <!--
227
+ ### Direct Usage (Transformers)
228
+
229
+ <details><summary>Click to see the direct usage in Transformers</summary>
230
+
231
+ </details>
232
+ -->
233
+
234
+ <!--
235
+ ### Downstream Usage (Sentence Transformers)
236
+
237
+ You can finetune this model on your own dataset.
238
+
239
+ <details><summary>Click to expand</summary>
240
+
241
+ </details>
242
+ -->
243
+
244
+ <!--
245
+ ### Out-of-Scope Use
246
+
247
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
248
+ -->
249
+
250
+ ## Evaluation
251
+
252
+ ### Metrics
253
+
254
+ #### Information Retrieval
255
+ * Dataset: `sentence-transformers/all-mpnet-base-v2`
256
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
257
+
258
+ | Metric | Value |
259
+ |:--------------------|:-----------|
260
+ | cosine_accuracy@1 | 0.592 |
261
+ | cosine_accuracy@3 | 0.711 |
262
+ | cosine_accuracy@5 | 0.751 |
263
+ | cosine_accuracy@10 | 0.814 |
264
+ | cosine_precision@1 | 0.592 |
265
+ | cosine_precision@3 | 0.237 |
266
+ | cosine_precision@5 | 0.1502 |
267
+ | cosine_precision@10 | 0.0814 |
268
+ | cosine_recall@1 | 0.592 |
269
+ | cosine_recall@3 | 0.711 |
270
+ | cosine_recall@5 | 0.751 |
271
+ | cosine_recall@10 | 0.814 |
272
+ | cosine_ndcg@10 | 0.6988 |
273
+ | cosine_mrr@10 | 0.6625 |
274
+ | **cosine_map@100** | **0.6665** |
275
+
276
+ <!--
277
+ ## Bias, Risks and Limitations
278
+
279
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
280
+ -->
281
+
282
+ <!--
283
+ ### Recommendations
284
+
285
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
286
+ -->
287
+
288
+ ## Training Details
289
+
290
+ ### Training Dataset
291
+
292
+ #### json
293
+
294
+ * Dataset: json
295
+ * Size: 807,656 training samples
296
+ * Columns: <code>positive</code> and <code>anchor</code>
297
+ * Approximate statistics based on the first 1000 samples:
298
+ | | positive | anchor |
299
+ |:--------|:-------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
300
+ | type | string | string |
301
+ | details | <ul><li>min: 45 tokens</li><li>mean: 237.14 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 12.34 tokens</li><li>max: 101 tokens</li></ul> |
302
+ * Samples:
303
+ | positive | anchor |
304
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------|
305
+ | <code><p id="pa01" num="0001">The invention relates to an image fusion method and device, which includes: obtaining a first short-focus image and a first long-focus image acquired by a short-focus sensor and a long-focus sensor at the same time; according to the focal lengths of a short-focus lens and a long-focus lens, calculating a reduction coefficient corresponding to the first long-focus image when the sizes of the same target in the first long-focus image and the first short-focus image are matched; performing a reduction processing on the first long-focus image according to the reduction coefficient to obtain a second long-focus image; according to a relative angle of the current long-focus lens and short-focus lens, calculating a position of the second long-focus image in the first short-focus image when the positions of the same target in the second long-focus image and the first short-focus image are matched; and according to the position of the second long-focus image in the first short-focus image, covering the first short-focus image by the second long-focus image to obtain a fused image. According to embodiments of the present application, on the premise of considering both the monitoring range and the definition, the monitoring cost is reduced, and the monitoring efficiency is improved.<img id="iaf01" file="imgaf001.tif" wi="92" he="72" img-content="drawing" img-format="tif"/></p></code> | <code>IMAGE FUSION METHOD AND DEVICE</code> |
306
+ | <code><p id="pa01" num="0001">The present invention discloses an <i>ex vivo</i> method for the diagnostic and/or prognostic assessment of the acute-on-chronic liver failure (ACLF) syndrome in a patient with a liver disorder characterized in that it comprises the steps of: (a) measuring a panel of metabolites related with acylcarnitines-sialic acid-acetylated amino acids and/or sugar alcohols and derivatives-tryptophan metabolism-catecholamines derivatives in a biological sample of said patient; and (b) comparing the level of said metabolites in the sample with the level of said metabolites in healthy patients; and wherein an increase of at least 1.2 times of the level of said metabolites is indicative of ACLF syndrome.</p></code> | <code>METHOD FOR THE DIAGNOSTIC AND/OR PROGNOSTIC ASSESSMENT OF ACUTE-ON-CHRONIC LIVER FAILURE SYNDROME IN PATIENTS WITH LIVER DISORDERS</code> |
307
+ | <code><p id="pa01" num="0001">A valve housing receives a spool 34 and the spool has a regulating chamber 52 selectively communicating a supply line to a return line. The spool 34 is biased in one direction by a spring force and there is a second force biasing the spool in an opposed direction whith the second bias force being provided by a fluid pressure within a hydraulic system associated which the pressure regulating valve. The amount of communication between the supply port 111 and the return port 99 is regulated by a position of the spool 34 as the bias force from the fluid pressure change. Damper chambers are provided on opposed sides of the spool and serve to dampen a speed of movement of the spool and a supply line for supplying fluid into the damper chambers through check valves 44, 64. The supply line serves to assist in purging air outwardly of the damper chambers.<br><img id="iaf01" file="imgaf001.tif" wi="142" he="100" img-content="drawing" img-format="tif"/></p></code> | <code>Air purging pressure regulating valve</code> |
308
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
309
+ ```json
310
+ {
311
+ "scale": 20.0,
312
+ "similarity_fct": "cos_sim"
313
+ }
314
+ ```
315
+
316
+ ### Training Hyperparameters
317
+ #### Non-Default Hyperparameters
318
+
319
+ - `eval_strategy`: steps
320
+ - `per_device_train_batch_size`: 4
321
+ - `per_device_eval_batch_size`: 2
322
+ - `learning_rate`: 2e-05
323
+ - `num_train_epochs`: 1
324
+ - `warmup_ratio`: 0.1
325
+ - `bf16`: True
326
+ - `batch_sampler`: no_duplicates
327
+
328
+ #### All Hyperparameters
329
+ <details><summary>Click to expand</summary>
330
+
331
+ - `overwrite_output_dir`: False
332
+ - `do_predict`: False
333
+ - `eval_strategy`: steps
334
+ - `prediction_loss_only`: True
335
+ - `per_device_train_batch_size`: 4
336
+ - `per_device_eval_batch_size`: 2
337
+ - `per_gpu_train_batch_size`: None
338
+ - `per_gpu_eval_batch_size`: None
339
+ - `gradient_accumulation_steps`: 1
340
+ - `eval_accumulation_steps`: None
341
+ - `torch_empty_cache_steps`: None
342
+ - `learning_rate`: 2e-05
343
+ - `weight_decay`: 0.0
344
+ - `adam_beta1`: 0.9
345
+ - `adam_beta2`: 0.999
346
+ - `adam_epsilon`: 1e-08
347
+ - `max_grad_norm`: 1.0
348
+ - `num_train_epochs`: 1
349
+ - `max_steps`: -1
350
+ - `lr_scheduler_type`: linear
351
+ - `lr_scheduler_kwargs`: {}
352
+ - `warmup_ratio`: 0.1
353
+ - `warmup_steps`: 0
354
+ - `log_level`: passive
355
+ - `log_level_replica`: warning
356
+ - `log_on_each_node`: True
357
+ - `logging_nan_inf_filter`: True
358
+ - `save_safetensors`: True
359
+ - `save_on_each_node`: False
360
+ - `save_only_model`: False
361
+ - `restore_callback_states_from_checkpoint`: False
362
+ - `no_cuda`: False
363
+ - `use_cpu`: False
364
+ - `use_mps_device`: False
365
+ - `seed`: 42
366
+ - `data_seed`: None
367
+ - `jit_mode_eval`: False
368
+ - `use_ipex`: False
369
+ - `bf16`: True
370
+ - `fp16`: False
371
+ - `fp16_opt_level`: O1
372
+ - `half_precision_backend`: auto
373
+ - `bf16_full_eval`: False
374
+ - `fp16_full_eval`: False
375
+ - `tf32`: None
376
+ - `local_rank`: 0
377
+ - `ddp_backend`: None
378
+ - `tpu_num_cores`: None
379
+ - `tpu_metrics_debug`: False
380
+ - `debug`: []
381
+ - `dataloader_drop_last`: False
382
+ - `dataloader_num_workers`: 0
383
+ - `dataloader_prefetch_factor`: None
384
+ - `past_index`: -1
385
+ - `disable_tqdm`: False
386
+ - `remove_unused_columns`: True
387
+ - `label_names`: None
388
+ - `load_best_model_at_end`: False
389
+ - `ignore_data_skip`: False
390
+ - `fsdp`: []
391
+ - `fsdp_min_num_params`: 0
392
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
393
+ - `fsdp_transformer_layer_cls_to_wrap`: None
394
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
395
+ - `deepspeed`: None
396
+ - `label_smoothing_factor`: 0.0
397
+ - `optim`: adamw_torch
398
+ - `optim_args`: None
399
+ - `adafactor`: False
400
+ - `group_by_length`: False
401
+ - `length_column_name`: length
402
+ - `ddp_find_unused_parameters`: None
403
+ - `ddp_bucket_cap_mb`: None
404
+ - `ddp_broadcast_buffers`: False
405
+ - `dataloader_pin_memory`: True
406
+ - `dataloader_persistent_workers`: False
407
+ - `skip_memory_metrics`: True
408
+ - `use_legacy_prediction_loop`: False
409
+ - `push_to_hub`: False
410
+ - `resume_from_checkpoint`: None
411
+ - `hub_model_id`: None
412
+ - `hub_strategy`: every_save
413
+ - `hub_private_repo`: False
414
+ - `hub_always_push`: False
415
+ - `gradient_checkpointing`: False
416
+ - `gradient_checkpointing_kwargs`: None
417
+ - `include_inputs_for_metrics`: False
418
+ - `eval_do_concat_batches`: True
419
+ - `fp16_backend`: auto
420
+ - `push_to_hub_model_id`: None
421
+ - `push_to_hub_organization`: None
422
+ - `mp_parameters`:
423
+ - `auto_find_batch_size`: False
424
+ - `full_determinism`: False
425
+ - `torchdynamo`: None
426
+ - `ray_scope`: last
427
+ - `ddp_timeout`: 1800
428
+ - `torch_compile`: False
429
+ - `torch_compile_backend`: None
430
+ - `torch_compile_mode`: None
431
+ - `dispatch_batches`: None
432
+ - `split_batches`: None
433
+ - `include_tokens_per_second`: False
434
+ - `include_num_input_tokens_seen`: False
435
+ - `neftune_noise_alpha`: None
436
+ - `optim_target_modules`: None
437
+ - `batch_eval_metrics`: False
438
+ - `eval_on_start`: False
439
+ - `use_liger_kernel`: False
440
+ - `eval_use_gather_object`: False
441
+ - `batch_sampler`: no_duplicates
442
+ - `multi_dataset_batch_sampler`: proportional
443
+
444
+ </details>
445
+
446
+ ### Training Logs
447
+ | Epoch | Step | Training Loss | sentence-transformers/all-mpnet-base-v2_cosine_map@100 |
448
+ |:-----:|:----:|:-------------:|:------------------------------------------------------:|
449
+ | 0.032 | 100 | 0.1433 | 0.6217 |
450
+ | 0.064 | 200 | 0.0953 | 0.6447 |
451
+ | 0.096 | 300 | 0.1084 | 0.6612 |
452
+ | 0.128 | 400 | 0.0817 | 0.6546 |
453
+ | 0.16 | 500 | 0.0768 | 0.6512 |
454
+ | 0.192 | 600 | 0.0779 | 0.6466 |
455
+ | 0.224 | 700 | 0.0709 | 0.6594 |
456
+ | 0.256 | 800 | 0.0813 | 0.6441 |
457
+ | 0.288 | 900 | 0.0597 | 0.6454 |
458
+ | 0.32 | 1000 | 0.0744 | 0.6496 |
459
+ | 0.352 | 1100 | 0.0669 | 0.6608 |
460
+ | 0.384 | 1200 | 0.0657 | 0.6566 |
461
+ | 0.416 | 1300 | 0.0489 | 0.6660 |
462
+ | 0.448 | 1400 | 0.0643 | 0.6597 |
463
+ | 0.48 | 1500 | 0.0593 | 0.6587 |
464
+ | 0.512 | 1600 | 0.0598 | 0.6613 |
465
+ | 0.544 | 1700 | 0.0737 | 0.6570 |
466
+ | 0.576 | 1800 | 0.0661 | 0.6655 |
467
+ | 0.608 | 1900 | 0.0499 | 0.6613 |
468
+ | 0.64 | 2000 | 0.0641 | 0.6616 |
469
+ | 0.672 | 2100 | 0.0679 | 0.6662 |
470
+ | 0.704 | 2200 | 0.0521 | 0.6715 |
471
+ | 0.736 | 2300 | 0.0569 | 0.6651 |
472
+ | 0.768 | 2400 | 0.0507 | 0.6679 |
473
+ | 0.8 | 2500 | 0.0405 | 0.6678 |
474
+ | 0.832 | 2600 | 0.0548 | 0.6690 |
475
+ | 0.864 | 2700 | 0.0403 | 0.6692 |
476
+ | 0.896 | 2800 | 0.0613 | 0.6649 |
477
+ | 0.928 | 2900 | 0.0485 | 0.6673 |
478
+ | 0.96 | 3000 | 0.0495 | 0.6674 |
479
+ | 0.992 | 3100 | 0.0546 | 0.6665 |
480
+
481
+
482
+ ### Framework Versions
483
+ - Python: 3.11.9
484
+ - Sentence Transformers: 3.2.1
485
+ - Transformers: 4.45.2
486
+ - PyTorch: 2.3.1.post300
487
+ - Accelerate: 1.0.1
488
+ - Datasets: 3.0.1
489
+ - Tokenizers: 0.20.1
490
+
491
+ ## Citation
492
+
493
+ ### BibTeX
494
+
495
+ #### Sentence Transformers
496
+ ```bibtex
497
+ @inproceedings{reimers-2019-sentence-bert,
498
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
499
+ author = "Reimers, Nils and Gurevych, Iryna",
500
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
501
+ month = "11",
502
+ year = "2019",
503
+ publisher = "Association for Computational Linguistics",
504
+ url = "https://arxiv.org/abs/1908.10084",
505
+ }
506
+ ```
507
+
508
+ #### MultipleNegativesRankingLoss
509
+ ```bibtex
510
+ @misc{henderson2017efficient,
511
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
512
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
513
+ year={2017},
514
+ eprint={1705.00652},
515
+ archivePrefix={arXiv},
516
+ primaryClass={cs.CL}
517
+ }
518
+ ```
519
+
520
+ <!--
521
+ ## Glossary
522
+
523
+ *Clearly define terms in order to be accessible across audiences.*
524
+ -->
525
+
526
+ <!--
527
+ ## Model Card Authors
528
+
529
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
530
+ -->
531
+
532
+ <!--
533
+ ## Model Card Contact
534
+
535
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
536
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.45.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.3.1.post300"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33358ac4bf89a8052e2964afdf4e0cfc3811c1976118ac3e384dc5240d8c99be
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "sep_token": "</s>",
65
+ "stride": 0,
66
+ "strip_accents": null,
67
+ "tokenize_chinese_chars": true,
68
+ "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
+ "unk_token": "[UNK]"
72
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff