dinho1597 commited on
Commit
8a78e9d
·
verified ·
1 Parent(s): b4bf240

Subiendo modelo inicial

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,488 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:5600
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-small-en-v1.5
10
+ widget:
11
+ - source_sentence: What is the main factor of signal interference in MCFs?
12
+ sentences:
13
+ - The main factor of signal interference in MCFs is crosstalk, which is the leakage
14
+ of a fraction of the signal power from a given core to its neighboring core.
15
+ - An integrity group temporal key (IGTK) is a random value used to protect group
16
+ addressed medium access control (MAC) management protocol data units (MMPDUs)
17
+ from a broadcast/multicast source station (STA).
18
+ - Wireless sensing through the combined use of radio wave and AI technologies aims
19
+ to identify objects and recognize actions with high precision.
20
+ - source_sentence: What types of drones can be used to construct multi-tier drone-cell
21
+ networks?
22
+ sentences:
23
+ - The coupling coefficient represents the tightness of coupling between transmit
24
+ and receive coils in wireless charging systems.
25
+ - A cheap, slow photodiode placed next to the rear face of the laser package is
26
+ commonly used as the monitor detector in laser drive circuits.
27
+ - Multi-tier drone-cell networks can be constructed by utilizing several drone types,
28
+ similar to terrestrial HetNets with macro-, small-, femtocells, and relays.
29
+ - source_sentence: Which technology was explored for high capacity last mile and pre-aggregation
30
+ backhaul in small cell networks?
31
+ sentences:
32
+ - According to Pearl's Ladder of Causation, counterfactual questions can only be
33
+ answered if information from all other levels (associational and interventional)
34
+ is available. Counterfactuals subsume interventional and associational questions,
35
+ and therefore sit at the top of the hierarchy.
36
+ - Shannon's classical source coding theorem provides the minimum distortion achievable
37
+ in encoding a Gaussian stationary input signal.
38
+ - The passage mentions that 60 GHz and 70-80 GHz millimeter wave communication technologies
39
+ were explored for high capacity last mile and pre-aggregation backhaul in small
40
+ cell networks.
41
+ - source_sentence: What is the main output of the design procedure for a passive lossless
42
+ Huygens metasurface?
43
+ sentences:
44
+ - Entanglement distillation is the process of purifying imperfect entangled states
45
+ to obtain maximally entangled states.
46
+ - The main output of the design procedure is the transmitted fields as well as the
47
+ surface impedance and admittance.
48
+ - The component of IoT responsible for sensing and collecting data is the sensors.
49
+ - source_sentence: What is the formula for the relative entropy between two probability
50
+ density functions?
51
+ sentences:
52
+ - The consequence of the fact that the total power radiated varies as the square
53
+ of the frequency of the oscillation is that shorter wavelength (higher frequency)
54
+ light is scattered much more strongly than longer wavelength (lower frequency)
55
+ light.
56
+ - Hybrid infrastructures are comprised of various proximate and distant computing
57
+ nodes, either mobile or immobile.
58
+ - The relative entropy between two probability density functions f and g is equal
59
+ to the negative integral of f(x) multiplied by the logarithm of the ratio of f(x)
60
+ and g(x), with respect to x.
61
+ pipeline_tag: sentence-similarity
62
+ library_name: sentence-transformers
63
+ metrics:
64
+ - cosine_accuracy@1
65
+ - cosine_accuracy@3
66
+ - cosine_accuracy@5
67
+ - cosine_accuracy@10
68
+ - cosine_precision@1
69
+ - cosine_recall@1
70
+ - cosine_ndcg@10
71
+ - cosine_mrr@10
72
+ - cosine_map@100
73
+ model-index:
74
+ - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
75
+ results:
76
+ - task:
77
+ type: information-retrieval
78
+ name: Information Retrieval
79
+ dataset:
80
+ name: telecom ir eval
81
+ type: telecom-ir-eval
82
+ metrics:
83
+ - type: cosine_accuracy@1
84
+ value: 0.9733333333333334
85
+ name: Cosine Accuracy@1
86
+ - type: cosine_accuracy@3
87
+ value: 0.995
88
+ name: Cosine Accuracy@3
89
+ - type: cosine_accuracy@5
90
+ value: 0.995
91
+ name: Cosine Accuracy@5
92
+ - type: cosine_accuracy@10
93
+ value: 0.995
94
+ name: Cosine Accuracy@10
95
+ - type: cosine_precision@1
96
+ value: 0.9733333333333334
97
+ name: Cosine Precision@1
98
+ - type: cosine_recall@1
99
+ value: 0.9733333333333334
100
+ name: Cosine Recall@1
101
+ - type: cosine_ndcg@10
102
+ value: 0.985912396714286
103
+ name: Cosine Ndcg@10
104
+ - type: cosine_mrr@10
105
+ value: 0.9827777777777778
106
+ name: Cosine Mrr@10
107
+ - type: cosine_map@100
108
+ value: 0.9831452173557438
109
+ name: Cosine Map@100
110
+ ---
111
+
112
+ # SentenceTransformer based on BAAI/bge-small-en-v1.5
113
+
114
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
115
+
116
+ ## Model Details
117
+
118
+ ### Model Description
119
+ - **Model Type:** Sentence Transformer
120
+ - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a -->
121
+ - **Maximum Sequence Length:** 512 tokens
122
+ - **Output Dimensionality:** 384 dimensions
123
+ - **Similarity Function:** Cosine Similarity
124
+ - **Training Dataset:**
125
+ - csv
126
+ <!-- - **Language:** Unknown -->
127
+ <!-- - **License:** Unknown -->
128
+
129
+ ### Model Sources
130
+
131
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
132
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
133
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
134
+
135
+ ### Full Model Architecture
136
+
137
+ ```
138
+ SentenceTransformer(
139
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
140
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
141
+ (2): Normalize()
142
+ )
143
+ ```
144
+
145
+ ## Usage
146
+
147
+ ### Direct Usage (Sentence Transformers)
148
+
149
+ First install the Sentence Transformers library:
150
+
151
+ ```bash
152
+ pip install -U sentence-transformers
153
+ ```
154
+
155
+ Then you can load this model and run inference.
156
+ ```python
157
+ from sentence_transformers import SentenceTransformer
158
+
159
+ # Download from the 🤗 Hub
160
+ model = SentenceTransformer("sentence_transformers_model_id")
161
+ # Run inference
162
+ sentences = [
163
+ 'What is the formula for the relative entropy between two probability density functions?',
164
+ 'The relative entropy between two probability density functions f and g is equal to the negative integral of f(x) multiplied by the logarithm of the ratio of f(x) and g(x), with respect to x.',
165
+ 'The consequence of the fact that the total power radiated varies as the square of the frequency of the oscillation is that shorter wavelength (higher frequency) light is scattered much more strongly than longer wavelength (lower frequency) light.',
166
+ ]
167
+ embeddings = model.encode(sentences)
168
+ print(embeddings.shape)
169
+ # [3, 384]
170
+
171
+ # Get the similarity scores for the embeddings
172
+ similarities = model.similarity(embeddings, embeddings)
173
+ print(similarities.shape)
174
+ # [3, 3]
175
+ ```
176
+
177
+ <!--
178
+ ### Direct Usage (Transformers)
179
+
180
+ <details><summary>Click to see the direct usage in Transformers</summary>
181
+
182
+ </details>
183
+ -->
184
+
185
+ <!--
186
+ ### Downstream Usage (Sentence Transformers)
187
+
188
+ You can finetune this model on your own dataset.
189
+
190
+ <details><summary>Click to expand</summary>
191
+
192
+ </details>
193
+ -->
194
+
195
+ <!--
196
+ ### Out-of-Scope Use
197
+
198
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
199
+ -->
200
+
201
+ ## Evaluation
202
+
203
+ ### Metrics
204
+
205
+ #### Information Retrieval
206
+
207
+ * Dataset: `telecom-ir-eval`
208
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
209
+
210
+ | Metric | Value |
211
+ |:-------------------|:-----------|
212
+ | cosine_accuracy@1 | 0.9733 |
213
+ | cosine_accuracy@3 | 0.995 |
214
+ | cosine_accuracy@5 | 0.995 |
215
+ | cosine_accuracy@10 | 0.995 |
216
+ | cosine_precision@1 | 0.9733 |
217
+ | cosine_recall@1 | 0.9733 |
218
+ | **cosine_ndcg@10** | **0.9859** |
219
+ | cosine_mrr@10 | 0.9828 |
220
+ | cosine_map@100 | 0.9831 |
221
+
222
+ <!--
223
+ ## Bias, Risks and Limitations
224
+
225
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
226
+ -->
227
+
228
+ <!--
229
+ ### Recommendations
230
+
231
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
232
+ -->
233
+
234
+ ## Training Details
235
+
236
+ ### Training Dataset
237
+
238
+ #### csv
239
+
240
+ * Dataset: csv
241
+ * Size: 5,600 training samples
242
+ * Columns: <code>anchor</code> and <code>positive</code>
243
+ * Approximate statistics based on the first 1000 samples:
244
+ | | anchor | positive |
245
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
246
+ | type | string | string |
247
+ | details | <ul><li>min: 4 tokens</li><li>mean: 18.48 tokens</li><li>max: 56 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 29.0 tokens</li><li>max: 85 tokens</li></ul> |
248
+ * Samples:
249
+ | anchor | positive |
250
+ |:-----------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
251
+ | <code>How can the unique decodability of a code be tested using the Sardinas and Patterson test?</code> | <code>The Sardinas and Patterson test for unique decodability involves checking if no codewords are prefixes of any other codewords.</code> |
252
+ | <code>What is the purpose of encapsulation in the OSI (Open System Interconnection) model?</code> | <code>Encapsulation is used to add control information and transform data units into protocol data units.</code> |
253
+ | <code>What advantages do measurements from user equipment (UE) have over drive tests in disaster small cell networks?</code> | <code>Measurements from user equipment (UE) have the advantages of reduced labor intensity, measurements obtained from additional locations, such as inside buildings, and better adaptation to specific characteristics and requirements in disaster scenarios.</code> |
254
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
255
+ ```json
256
+ {
257
+ "scale": 20.0,
258
+ "similarity_fct": "cos_sim"
259
+ }
260
+ ```
261
+
262
+ ### Evaluation Dataset
263
+
264
+ #### csv
265
+
266
+ * Dataset: csv
267
+ * Size: 1,400 evaluation samples
268
+ * Columns: <code>anchor</code> and <code>positive</code>
269
+ * Approximate statistics based on the first 1000 samples:
270
+ | | anchor | positive |
271
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
272
+ | type | string | string |
273
+ | details | <ul><li>min: 4 tokens</li><li>mean: 18.92 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 29.0 tokens</li><li>max: 96 tokens</li></ul> |
274
+ * Samples:
275
+ | anchor | positive |
276
+ |:--------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
277
+ | <code>What are the three major steps in SLAM-based techniques for THz localization?</code> | <code>SLAM-based techniques for THz localization involve imaging the environment, estimating ranges to the user, and fusing the images with the estimated ranges.</code> |
278
+ | <code>What is the service time distribution in the M/M(X)/1 model?</code> | <code>In the M/M(X)/1 model, the service time distribution is exponential with parameter µ.</code> |
279
+ | <code>What is the main advantage of the ensemble patch method in generating adversarial patches?</code> | <code>The main advantage of the ensemble patch method is that it achieves a higher attack success rate compared to single patches.</code> |
280
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
281
+ ```json
282
+ {
283
+ "scale": 20.0,
284
+ "similarity_fct": "cos_sim"
285
+ }
286
+ ```
287
+
288
+ ### Training Hyperparameters
289
+ #### Non-Default Hyperparameters
290
+
291
+ - `eval_strategy`: steps
292
+ - `per_device_train_batch_size`: 128
293
+ - `per_device_eval_batch_size`: 128
294
+ - `weight_decay`: 0.01
295
+ - `num_train_epochs`: 5
296
+ - `lr_scheduler_type`: cosine_with_restarts
297
+ - `warmup_ratio`: 0.1
298
+ - `fp16`: True
299
+ - `load_best_model_at_end`: True
300
+ - `batch_sampler`: no_duplicates
301
+
302
+ #### All Hyperparameters
303
+ <details><summary>Click to expand</summary>
304
+
305
+ - `overwrite_output_dir`: False
306
+ - `do_predict`: False
307
+ - `eval_strategy`: steps
308
+ - `prediction_loss_only`: True
309
+ - `per_device_train_batch_size`: 128
310
+ - `per_device_eval_batch_size`: 128
311
+ - `per_gpu_train_batch_size`: None
312
+ - `per_gpu_eval_batch_size`: None
313
+ - `gradient_accumulation_steps`: 1
314
+ - `eval_accumulation_steps`: None
315
+ - `torch_empty_cache_steps`: None
316
+ - `learning_rate`: 5e-05
317
+ - `weight_decay`: 0.01
318
+ - `adam_beta1`: 0.9
319
+ - `adam_beta2`: 0.999
320
+ - `adam_epsilon`: 1e-08
321
+ - `max_grad_norm`: 1.0
322
+ - `num_train_epochs`: 5
323
+ - `max_steps`: -1
324
+ - `lr_scheduler_type`: cosine_with_restarts
325
+ - `lr_scheduler_kwargs`: {}
326
+ - `warmup_ratio`: 0.1
327
+ - `warmup_steps`: 0
328
+ - `log_level`: passive
329
+ - `log_level_replica`: warning
330
+ - `log_on_each_node`: True
331
+ - `logging_nan_inf_filter`: True
332
+ - `save_safetensors`: True
333
+ - `save_on_each_node`: False
334
+ - `save_only_model`: False
335
+ - `restore_callback_states_from_checkpoint`: False
336
+ - `no_cuda`: False
337
+ - `use_cpu`: False
338
+ - `use_mps_device`: False
339
+ - `seed`: 42
340
+ - `data_seed`: None
341
+ - `jit_mode_eval`: False
342
+ - `use_ipex`: False
343
+ - `bf16`: False
344
+ - `fp16`: True
345
+ - `fp16_opt_level`: O1
346
+ - `half_precision_backend`: auto
347
+ - `bf16_full_eval`: False
348
+ - `fp16_full_eval`: False
349
+ - `tf32`: None
350
+ - `local_rank`: 0
351
+ - `ddp_backend`: None
352
+ - `tpu_num_cores`: None
353
+ - `tpu_metrics_debug`: False
354
+ - `debug`: []
355
+ - `dataloader_drop_last`: False
356
+ - `dataloader_num_workers`: 0
357
+ - `dataloader_prefetch_factor`: None
358
+ - `past_index`: -1
359
+ - `disable_tqdm`: False
360
+ - `remove_unused_columns`: True
361
+ - `label_names`: None
362
+ - `load_best_model_at_end`: True
363
+ - `ignore_data_skip`: False
364
+ - `fsdp`: []
365
+ - `fsdp_min_num_params`: 0
366
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
367
+ - `fsdp_transformer_layer_cls_to_wrap`: None
368
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
369
+ - `deepspeed`: None
370
+ - `label_smoothing_factor`: 0.0
371
+ - `optim`: adamw_torch
372
+ - `optim_args`: None
373
+ - `adafactor`: False
374
+ - `group_by_length`: False
375
+ - `length_column_name`: length
376
+ - `ddp_find_unused_parameters`: None
377
+ - `ddp_bucket_cap_mb`: None
378
+ - `ddp_broadcast_buffers`: False
379
+ - `dataloader_pin_memory`: True
380
+ - `dataloader_persistent_workers`: False
381
+ - `skip_memory_metrics`: True
382
+ - `use_legacy_prediction_loop`: False
383
+ - `push_to_hub`: False
384
+ - `resume_from_checkpoint`: None
385
+ - `hub_model_id`: None
386
+ - `hub_strategy`: every_save
387
+ - `hub_private_repo`: None
388
+ - `hub_always_push`: False
389
+ - `gradient_checkpointing`: False
390
+ - `gradient_checkpointing_kwargs`: None
391
+ - `include_inputs_for_metrics`: False
392
+ - `include_for_metrics`: []
393
+ - `eval_do_concat_batches`: True
394
+ - `fp16_backend`: auto
395
+ - `push_to_hub_model_id`: None
396
+ - `push_to_hub_organization`: None
397
+ - `mp_parameters`:
398
+ - `auto_find_batch_size`: False
399
+ - `full_determinism`: False
400
+ - `torchdynamo`: None
401
+ - `ray_scope`: last
402
+ - `ddp_timeout`: 1800
403
+ - `torch_compile`: False
404
+ - `torch_compile_backend`: None
405
+ - `torch_compile_mode`: None
406
+ - `dispatch_batches`: None
407
+ - `split_batches`: None
408
+ - `include_tokens_per_second`: False
409
+ - `include_num_input_tokens_seen`: False
410
+ - `neftune_noise_alpha`: None
411
+ - `optim_target_modules`: None
412
+ - `batch_eval_metrics`: False
413
+ - `eval_on_start`: False
414
+ - `use_liger_kernel`: False
415
+ - `eval_use_gather_object`: False
416
+ - `average_tokens_across_devices`: False
417
+ - `prompts`: None
418
+ - `batch_sampler`: no_duplicates
419
+ - `multi_dataset_batch_sampler`: proportional
420
+
421
+ </details>
422
+
423
+ ### Training Logs
424
+ | Epoch | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
425
+ |:----------:|:-------:|:-------------:|:---------------:|:------------------------------:|
426
+ | 1.1364 | 50 | 0.2567 | 0.0419 | 0.9844 |
427
+ | **2.2727** | **100** | **0.0502** | **0.0397** | **0.9859** |
428
+ | 3.4091 | 150 | 0.0277 | 0.0399 | 0.9846 |
429
+ | 4.5455 | 200 | 0.0231 | 0.0406 | 0.9840 |
430
+ | 5.0 | 220 | - | - | 0.9859 |
431
+
432
+ * The bold row denotes the saved checkpoint.
433
+
434
+ ### Framework Versions
435
+ - Python: 3.10.12
436
+ - Sentence Transformers: 3.3.1
437
+ - Transformers: 4.47.1
438
+ - PyTorch: 2.5.1+cu121
439
+ - Accelerate: 1.2.1
440
+ - Datasets: 3.2.0
441
+ - Tokenizers: 0.21.0
442
+
443
+ ## Citation
444
+
445
+ ### BibTeX
446
+
447
+ #### Sentence Transformers
448
+ ```bibtex
449
+ @inproceedings{reimers-2019-sentence-bert,
450
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
451
+ author = "Reimers, Nils and Gurevych, Iryna",
452
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
453
+ month = "11",
454
+ year = "2019",
455
+ publisher = "Association for Computational Linguistics",
456
+ url = "https://arxiv.org/abs/1908.10084",
457
+ }
458
+ ```
459
+
460
+ #### MultipleNegativesRankingLoss
461
+ ```bibtex
462
+ @misc{henderson2017efficient,
463
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
464
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
465
+ year={2017},
466
+ eprint={1705.00652},
467
+ archivePrefix={arXiv},
468
+ primaryClass={cs.CL}
469
+ }
470
+ ```
471
+
472
+ <!--
473
+ ## Glossary
474
+
475
+ *Clearly define terms in order to be accessible across audiences.*
476
+ -->
477
+
478
+ <!--
479
+ ## Model Card Authors
480
+
481
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
482
+ -->
483
+
484
+ <!--
485
+ ## Model Card Contact
486
+
487
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
488
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-small-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.47.1",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d357419411446da249884d22a397c852f92dc65a14dfa14ad1b1972f6c66e28
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff