angelitasr commited on
Commit
b89829c
·
verified ·
1 Parent(s): 2b2ac5e

End of training

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,408 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:4370
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-base-en-v1.5
10
+ widget:
11
+ - source_sentence: '###Question###:Area Units-Convert from km² to m²-\( 2 \mathrm{~km}^{2}
12
+ \) is the same as _____ \( m^{2} \)
13
+
14
+ ###Correct Answer###:\( 2000000 \)
15
+
16
+ ###Misconcepted Incorrect answer###:\( 2000 \)'
17
+ sentences:
18
+ - Confuses an equation with an identity
19
+ - Does not square the conversion factor when converting squared units
20
+ - Rounds to wrong degree of accuracy (decimal places rather than significant figures)
21
+ - source_sentence: '###Question###:Basic Angle Facts (straight line, opposite, around
22
+ a point, etc)-Find missing angles using angles around a point-What is the size
23
+ of angle \( x \) ? ![Angles around a point, split into 2 parts. One is labelled
24
+ 310 degrees and the other x.]()
25
+
26
+ ###Correct Answer###:\( 50^{\circ} \)
27
+
28
+ ###Misconcepted Incorrect answer###:\( 310^{\circ} \)'
29
+ sentences:
30
+ - Believes the arrows for parallel lines mean equal length
31
+ - Rounds to the wrong degree of accuracy (rounds too little)
32
+ - Incorrectly identifies angles as vertically opposite
33
+ - source_sentence: '###Question###:BIDMAS-Use the order of operations to carry out
34
+ calculations involving addition, subtraction, multiplication, and/or division-\[
35
+
36
+ 10-8 \times 7+6=
37
+
38
+ \]
39
+
40
+
41
+ Which calculation should you do first?
42
+
43
+ ###Correct Answer###:\( 8 \times 7 \)
44
+
45
+ ###Misconcepted Incorrect answer###:\( 7+6 \)'
46
+ sentences:
47
+ - Ignores the negative sign
48
+ - Carries out operations from right to left regardless of priority order
49
+ - In repeated percentage change, believes the second change is only a percentage
50
+ of the first change, without including the original
51
+ - source_sentence: '###Question###:Multiples and Lowest Common Multiple-Identify common
52
+ multiples of three or more numbers-Which of the following numbers is a common
53
+ multiple of \( 4,6 \) and \( 12 \) ?
54
+
55
+ ###Correct Answer###:\( 12 \)
56
+
57
+ ###Misconcepted Incorrect answer###:\( 2 \)'
58
+ sentences:
59
+ - Confuses factors and multiples
60
+ - 'Does not know that to factorise a quadratic expression, to find two numbers that
61
+ add to give the coefficient of the x term, and multiply to give the non variable
62
+ term
63
+
64
+ '
65
+ - Does not link Pythagoras Theorem to finding distance between two points
66
+ - source_sentence: '###Question###:Combined Events-Calculate the probability of two
67
+ independent events occurring without drawing a tree diagram-![Two spinners shown.
68
+ The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]()
69
+ You spin the above fair spinners
70
+
71
+ What is the probability of getting a \( 1 \) on both spinners?
72
+
73
+ ###Correct Answer###:\( \frac{1}{20} \)
74
+
75
+ ###Misconcepted Incorrect answer###:\( \frac{1}{9} \)'
76
+ sentences:
77
+ - When multiplying fractions, multiplies the numerator and adds the denominator
78
+ - Does not follow the arrows through a function machine, changes the order of the
79
+ operations asked.
80
+ - Believes a curve can show a constant rate
81
+ pipeline_tag: sentence-similarity
82
+ library_name: sentence-transformers
83
+ ---
84
+
85
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
86
+
87
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
88
+
89
+ ## Model Details
90
+
91
+ ### Model Description
92
+ - **Model Type:** Sentence Transformer
93
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
94
+ - **Maximum Sequence Length:** 512 tokens
95
+ - **Output Dimensionality:** 768 tokens
96
+ - **Similarity Function:** Cosine Similarity
97
+ <!-- - **Training Dataset:** Unknown -->
98
+ <!-- - **Language:** Unknown -->
99
+ <!-- - **License:** Unknown -->
100
+
101
+ ### Model Sources
102
+
103
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
104
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
105
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
106
+
107
+ ### Full Model Architecture
108
+
109
+ ```
110
+ SentenceTransformer(
111
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
112
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
113
+ (2): Normalize()
114
+ )
115
+ ```
116
+
117
+ ## Usage
118
+
119
+ ### Direct Usage (Sentence Transformers)
120
+
121
+ First install the Sentence Transformers library:
122
+
123
+ ```bash
124
+ pip install -U sentence-transformers
125
+ ```
126
+
127
+ Then you can load this model and run inference.
128
+ ```python
129
+ from sentence_transformers import SentenceTransformer
130
+
131
+ # Download from the 🤗 Hub
132
+ model = SentenceTransformer("sentence_transformers_model_id")
133
+ # Run inference
134
+ sentences = [
135
+ '###Question###:Combined Events-Calculate the probability of two independent events occurring without drawing a tree diagram-![Two spinners shown. The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]() You spin the above fair spinners\nWhat is the probability of getting a \\( 1 \\) on both spinners?\n###Correct Answer###:\\( \\frac{1}{20} \\)\n###Misconcepted Incorrect answer###:\\( \\frac{1}{9} \\)',
136
+ 'When multiplying fractions, multiplies the numerator and adds the denominator',
137
+ 'Does not follow the arrows through a function machine, changes the order of the operations asked.',
138
+ ]
139
+ embeddings = model.encode(sentences)
140
+ print(embeddings.shape)
141
+ # [3, 768]
142
+
143
+ # Get the similarity scores for the embeddings
144
+ similarities = model.similarity(embeddings, embeddings)
145
+ print(similarities.shape)
146
+ # [3, 3]
147
+ ```
148
+
149
+ <!--
150
+ ### Direct Usage (Transformers)
151
+
152
+ <details><summary>Click to see the direct usage in Transformers</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Downstream Usage (Sentence Transformers)
159
+
160
+ You can finetune this model on your own dataset.
161
+
162
+ <details><summary>Click to expand</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Out-of-Scope Use
169
+
170
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
171
+ -->
172
+
173
+ <!--
174
+ ## Bias, Risks and Limitations
175
+
176
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
177
+ -->
178
+
179
+ <!--
180
+ ### Recommendations
181
+
182
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
183
+ -->
184
+
185
+ ## Training Details
186
+
187
+ ### Training Dataset
188
+
189
+ #### Unnamed Dataset
190
+
191
+
192
+ * Size: 4,370 training samples
193
+ * Columns: <code>anchor</code> and <code>positive</code>
194
+ * Approximate statistics based on the first 1000 samples:
195
+ | | anchor | positive |
196
+ |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
197
+ | type | string | string |
198
+ | details | <ul><li>min: 60 tokens</li><li>mean: 122.91 tokens</li><li>max: 435 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.81 tokens</li><li>max: 39 tokens</li></ul> |
199
+ * Samples:
200
+ | anchor | positive |
201
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
202
+ | <code>###Question###:Simplifying Algebraic Fractions-Simplify an algebraic fraction by factorising the numerator-Simplify the following, if possible: \( \frac{m^{2}+2 m-3}{m-3} \)<br>###Correct Answer###:Does not simplify<br>###Misconcepted Incorrect answer###:\( m+1 \)</code> | <code>Does not know that to factorise a quadratic expression, to find two numbers that add to give the coefficient of the x term, and multiply to give the non variable term<br></code> |
203
+ | <code>###Question###:Range and Interquartile Range from a List of Data-Calculate the range from a list of data-Tom and Katie are discussing the \( 5 \) plants with these heights:<br>\( 24 \mathrm{~cm}, 17 \mathrm{~cm}, 42 \mathrm{~cm}, 26 \mathrm{~cm}, 13 \mathrm{~cm} \)<br>Tom says if all the plants were cut in half, the range wouldn't change.<br>Katie says if all the plants grew by \( 3 \mathrm{~cm} \) each, the range wouldn't change.<br>Who do you agree with?<br>###Correct Answer###:Only<br>Katie<br>###Misconcepted Incorrect answer###:Only<br>Tom</code> | <code>Believes if you changed all values by the same proportion the range would not change</code> |
204
+ | <code>###Question###:Properties of Quadrilaterals-Recall and use the intersecting diagonals properties of a rectangle-The angles highlighted on this rectangle with different length sides can never be... ![A rectangle with the diagonals drawn in. The angle on the right hand side at the centre is highlighted in red and the angle at the bottom at the centre is highlighted in yellow.]()<br>###Correct Answer###:\( 90^{\circ} \)<br>###Misconcepted Incorrect answer###:acute</code> | <code>Does not know the properties of a rectangle</code> |
205
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
206
+ ```json
207
+ {
208
+ "scale": 20.0,
209
+ "similarity_fct": "cos_sim"
210
+ }
211
+ ```
212
+
213
+ ### Training Hyperparameters
214
+ #### Non-Default Hyperparameters
215
+
216
+ - `num_train_epochs`: 10
217
+ - `fp16`: True
218
+ - `push_to_hub`: True
219
+ - `batch_sampler`: no_duplicates
220
+
221
+ #### All Hyperparameters
222
+ <details><summary>Click to expand</summary>
223
+
224
+ - `overwrite_output_dir`: False
225
+ - `do_predict`: False
226
+ - `eval_strategy`: no
227
+ - `prediction_loss_only`: True
228
+ - `per_device_train_batch_size`: 8
229
+ - `per_device_eval_batch_size`: 8
230
+ - `per_gpu_train_batch_size`: None
231
+ - `per_gpu_eval_batch_size`: None
232
+ - `gradient_accumulation_steps`: 1
233
+ - `eval_accumulation_steps`: None
234
+ - `torch_empty_cache_steps`: None
235
+ - `learning_rate`: 5e-05
236
+ - `weight_decay`: 0.0
237
+ - `adam_beta1`: 0.9
238
+ - `adam_beta2`: 0.999
239
+ - `adam_epsilon`: 1e-08
240
+ - `max_grad_norm`: 1.0
241
+ - `num_train_epochs`: 10
242
+ - `max_steps`: -1
243
+ - `lr_scheduler_type`: linear
244
+ - `lr_scheduler_kwargs`: {}
245
+ - `warmup_ratio`: 0.0
246
+ - `warmup_steps`: 0
247
+ - `log_level`: passive
248
+ - `log_level_replica`: warning
249
+ - `log_on_each_node`: True
250
+ - `logging_nan_inf_filter`: True
251
+ - `save_safetensors`: True
252
+ - `save_on_each_node`: False
253
+ - `save_only_model`: False
254
+ - `restore_callback_states_from_checkpoint`: False
255
+ - `no_cuda`: False
256
+ - `use_cpu`: False
257
+ - `use_mps_device`: False
258
+ - `seed`: 42
259
+ - `data_seed`: None
260
+ - `jit_mode_eval`: False
261
+ - `use_ipex`: False
262
+ - `bf16`: False
263
+ - `fp16`: True
264
+ - `fp16_opt_level`: O1
265
+ - `half_precision_backend`: auto
266
+ - `bf16_full_eval`: False
267
+ - `fp16_full_eval`: False
268
+ - `tf32`: None
269
+ - `local_rank`: 0
270
+ - `ddp_backend`: None
271
+ - `tpu_num_cores`: None
272
+ - `tpu_metrics_debug`: False
273
+ - `debug`: []
274
+ - `dataloader_drop_last`: False
275
+ - `dataloader_num_workers`: 0
276
+ - `dataloader_prefetch_factor`: None
277
+ - `past_index`: -1
278
+ - `disable_tqdm`: False
279
+ - `remove_unused_columns`: True
280
+ - `label_names`: None
281
+ - `load_best_model_at_end`: False
282
+ - `ignore_data_skip`: False
283
+ - `fsdp`: []
284
+ - `fsdp_min_num_params`: 0
285
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
286
+ - `fsdp_transformer_layer_cls_to_wrap`: None
287
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
288
+ - `deepspeed`: None
289
+ - `label_smoothing_factor`: 0.0
290
+ - `optim`: adamw_torch
291
+ - `optim_args`: None
292
+ - `adafactor`: False
293
+ - `group_by_length`: False
294
+ - `length_column_name`: length
295
+ - `ddp_find_unused_parameters`: None
296
+ - `ddp_bucket_cap_mb`: None
297
+ - `ddp_broadcast_buffers`: False
298
+ - `dataloader_pin_memory`: True
299
+ - `dataloader_persistent_workers`: False
300
+ - `skip_memory_metrics`: True
301
+ - `use_legacy_prediction_loop`: False
302
+ - `push_to_hub`: True
303
+ - `resume_from_checkpoint`: None
304
+ - `hub_model_id`: None
305
+ - `hub_strategy`: every_save
306
+ - `hub_private_repo`: False
307
+ - `hub_always_push`: False
308
+ - `gradient_checkpointing`: False
309
+ - `gradient_checkpointing_kwargs`: None
310
+ - `include_inputs_for_metrics`: False
311
+ - `eval_do_concat_batches`: True
312
+ - `fp16_backend`: auto
313
+ - `push_to_hub_model_id`: None
314
+ - `push_to_hub_organization`: None
315
+ - `mp_parameters`:
316
+ - `auto_find_batch_size`: False
317
+ - `full_determinism`: False
318
+ - `torchdynamo`: None
319
+ - `ray_scope`: last
320
+ - `ddp_timeout`: 1800
321
+ - `torch_compile`: False
322
+ - `torch_compile_backend`: None
323
+ - `torch_compile_mode`: None
324
+ - `dispatch_batches`: None
325
+ - `split_batches`: None
326
+ - `include_tokens_per_second`: False
327
+ - `include_num_input_tokens_seen`: False
328
+ - `neftune_noise_alpha`: None
329
+ - `optim_target_modules`: None
330
+ - `batch_eval_metrics`: False
331
+ - `eval_on_start`: False
332
+ - `use_liger_kernel`: False
333
+ - `eval_use_gather_object`: False
334
+ - `batch_sampler`: no_duplicates
335
+ - `multi_dataset_batch_sampler`: proportional
336
+
337
+ </details>
338
+
339
+ ### Training Logs
340
+ | Epoch | Step | Training Loss |
341
+ |:------:|:----:|:-------------:|
342
+ | 0.9141 | 500 | 0.3742 |
343
+ | 1.8282 | 1000 | 0.1576 |
344
+ | 2.7422 | 1500 | 0.0786 |
345
+ | 3.6563 | 2000 | 0.037 |
346
+ | 4.5704 | 2500 | 0.0239 |
347
+ | 5.4845 | 3000 | 0.0153 |
348
+ | 6.3985 | 3500 | 0.0087 |
349
+ | 7.3126 | 4000 | 0.0046 |
350
+ | 8.2267 | 4500 | 0.0043 |
351
+ | 9.1408 | 5000 | 0.003 |
352
+
353
+
354
+ ### Framework Versions
355
+ - Python: 3.10.12
356
+ - Sentence Transformers: 3.1.1
357
+ - Transformers: 4.45.2
358
+ - PyTorch: 2.5.1+cu121
359
+ - Accelerate: 1.1.1
360
+ - Datasets: 3.1.0
361
+ - Tokenizers: 0.20.3
362
+
363
+ ## Citation
364
+
365
+ ### BibTeX
366
+
367
+ #### Sentence Transformers
368
+ ```bibtex
369
+ @inproceedings{reimers-2019-sentence-bert,
370
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
371
+ author = "Reimers, Nils and Gurevych, Iryna",
372
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
373
+ month = "11",
374
+ year = "2019",
375
+ publisher = "Association for Computational Linguistics",
376
+ url = "https://arxiv.org/abs/1908.10084",
377
+ }
378
+ ```
379
+
380
+ #### MultipleNegativesRankingLoss
381
+ ```bibtex
382
+ @misc{henderson2017efficient,
383
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
384
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
385
+ year={2017},
386
+ eprint={1705.00652},
387
+ archivePrefix={arXiv},
388
+ primaryClass={cs.CL}
389
+ }
390
+ ```
391
+
392
+ <!--
393
+ ## Glossary
394
+
395
+ *Clearly define terms in order to be accessible across audiences.*
396
+ -->
397
+
398
+ <!--
399
+ ## Model Card Authors
400
+
401
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
402
+ -->
403
+
404
+ <!--
405
+ ## Model Card Contact
406
+
407
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
408
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }