--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:4370 - loss:MultipleNegativesRankingLoss base_model: BAAI/bge-base-en-v1.5 widget: - source_sentence: '###Question###:Area Units-Convert from km² to m²-\( 2 \mathrm{~km}^{2} \) is the same as _____ \( m^{2} \) ###Correct Answer###:\( 2000000 \) ###Misconcepted Incorrect answer###:\( 2000 \)' sentences: - Confuses an equation with an identity - Does not square the conversion factor when converting squared units - Rounds to wrong degree of accuracy (decimal places rather than significant figures) - source_sentence: '###Question###:Basic Angle Facts (straight line, opposite, around a point, etc)-Find missing angles using angles around a point-What is the size of angle \( x \) ? ![Angles around a point, split into 2 parts. One is labelled 310 degrees and the other x.]() ###Correct Answer###:\( 50^{\circ} \) ###Misconcepted Incorrect answer###:\( 310^{\circ} \)' sentences: - Believes the arrows for parallel lines mean equal length - Rounds to the wrong degree of accuracy (rounds too little) - Incorrectly identifies angles as vertically opposite - source_sentence: '###Question###:BIDMAS-Use the order of operations to carry out calculations involving addition, subtraction, multiplication, and/or division-\[ 10-8 \times 7+6= \] Which calculation should you do first? ###Correct Answer###:\( 8 \times 7 \) ###Misconcepted Incorrect answer###:\( 7+6 \)' sentences: - Ignores the negative sign - Carries out operations from right to left regardless of priority order - In repeated percentage change, believes the second change is only a percentage of the first change, without including the original - source_sentence: '###Question###:Multiples and Lowest Common Multiple-Identify common multiples of three or more numbers-Which of the following numbers is a common multiple of \( 4,6 \) and \( 12 \) ? ###Correct Answer###:\( 12 \) ###Misconcepted Incorrect answer###:\( 2 \)' sentences: - Confuses factors and multiples - 'Does not know that to factorise a quadratic expression, to find two numbers that add to give the coefficient of the x term, and multiply to give the non variable term ' - Does not link Pythagoras Theorem to finding distance between two points - source_sentence: '###Question###:Combined Events-Calculate the probability of two independent events occurring without drawing a tree diagram-![Two spinners shown. The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]() You spin the above fair spinners What is the probability of getting a \( 1 \) on both spinners? ###Correct Answer###:\( \frac{1}{20} \) ###Misconcepted Incorrect answer###:\( \frac{1}{9} \)' sentences: - When multiplying fractions, multiplies the numerator and adds the denominator - Does not follow the arrows through a function machine, changes the order of the operations asked. - Believes a curve can show a constant rate pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on BAAI/bge-base-en-v1.5 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ '###Question###:Combined Events-Calculate the probability of two independent events occurring without drawing a tree diagram-![Two spinners shown. The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]() You spin the above fair spinners\nWhat is the probability of getting a \\( 1 \\) on both spinners?\n###Correct Answer###:\\( \\frac{1}{20} \\)\n###Misconcepted Incorrect answer###:\\( \\frac{1}{9} \\)', 'When multiplying fractions, multiplies the numerator and adds the denominator', 'Does not follow the arrows through a function machine, changes the order of the operations asked.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 4,370 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ###Question###:Simplifying Algebraic Fractions-Simplify an algebraic fraction by factorising the numerator-Simplify the following, if possible: \( \frac{m^{2}+2 m-3}{m-3} \)
###Correct Answer###:Does not simplify
###Misconcepted Incorrect answer###:\( m+1 \)
| Does not know that to factorise a quadratic expression, to find two numbers that add to give the coefficient of the x term, and multiply to give the non variable term
| | ###Question###:Range and Interquartile Range from a List of Data-Calculate the range from a list of data-Tom and Katie are discussing the \( 5 \) plants with these heights:
\( 24 \mathrm{~cm}, 17 \mathrm{~cm}, 42 \mathrm{~cm}, 26 \mathrm{~cm}, 13 \mathrm{~cm} \)
Tom says if all the plants were cut in half, the range wouldn't change.
Katie says if all the plants grew by \( 3 \mathrm{~cm} \) each, the range wouldn't change.
Who do you agree with?
###Correct Answer###:Only
Katie
###Misconcepted Incorrect answer###:Only
Tom
| Believes if you changed all values by the same proportion the range would not change | | ###Question###:Properties of Quadrilaterals-Recall and use the intersecting diagonals properties of a rectangle-The angles highlighted on this rectangle with different length sides can never be... ![A rectangle with the diagonals drawn in. The angle on the right hand side at the centre is highlighted in red and the angle at the bottom at the centre is highlighted in yellow.]()
###Correct Answer###:\( 90^{\circ} \)
###Misconcepted Incorrect answer###:acute
| Does not know the properties of a rectangle | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `num_train_epochs`: 10 - `fp16`: True - `push_to_hub`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 8 - `per_device_eval_batch_size`: 8 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: True - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.9141 | 500 | 0.3742 | | 1.8282 | 1000 | 0.1576 | | 2.7422 | 1500 | 0.0786 | | 3.6563 | 2000 | 0.037 | | 4.5704 | 2500 | 0.0239 | | 5.4845 | 3000 | 0.0153 | | 6.3985 | 3500 | 0.0087 | | 7.3126 | 4000 | 0.0046 | | 8.2267 | 4500 | 0.0043 | | 9.1408 | 5000 | 0.003 | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.1.1 - Transformers: 4.45.2 - PyTorch: 2.5.1+cu121 - Accelerate: 1.1.1 - Datasets: 3.1.0 - Tokenizers: 0.20.3 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```