---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:4370
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: '###Question###:Area Units-Convert from km² to m²-\( 2 \mathrm{~km}^{2}
    \) is the same as _____ \( m^{2} \)

    ###Correct Answer###:\( 2000000 \)

    ###Misconcepted Incorrect answer###:\( 2000 \)'
  sentences:
  - Confuses an equation with an identity
  - Does not square the conversion factor when converting squared units
  - Rounds to wrong degree of accuracy (decimal places rather than significant figures)
- source_sentence: '###Question###:Basic Angle Facts (straight line, opposite, around
    a point, etc)-Find missing angles using angles around a point-What is the size
    of angle \( x \) ? ![Angles around a point, split into 2 parts. One is labelled
    310 degrees and the other x.]()

    ###Correct Answer###:\( 50^{\circ} \)

    ###Misconcepted Incorrect answer###:\( 310^{\circ} \)'
  sentences:
  - Believes the arrows for parallel lines mean equal length
  - Rounds to the wrong degree of accuracy (rounds too little)
  - Incorrectly identifies angles as vertically opposite
- source_sentence: '###Question###:BIDMAS-Use the order of operations to carry out
    calculations involving addition, subtraction, multiplication, and/or division-\[

    10-8 \times 7+6=

    \]


    Which calculation should you do first?

    ###Correct Answer###:\( 8 \times 7 \)

    ###Misconcepted Incorrect answer###:\( 7+6 \)'
  sentences:
  - Ignores the negative sign
  - Carries out operations from right to left regardless of priority order
  - In repeated percentage change, believes the second change is only a percentage
    of the first change, without including the original
- source_sentence: '###Question###:Multiples and Lowest Common Multiple-Identify common
    multiples of three or more numbers-Which of the following numbers is a common
    multiple of \( 4,6 \) and \( 12 \) ?

    ###Correct Answer###:\( 12 \)

    ###Misconcepted Incorrect answer###:\( 2 \)'
  sentences:
  - Confuses factors and multiples
  - 'Does not know that to factorise a quadratic expression, to find two numbers that
    add to give the coefficient of the x term, and multiply to give the non variable
    term

    '
  - Does not link Pythagoras Theorem to finding distance between two points
- source_sentence: '###Question###:Combined Events-Calculate the probability of two
    independent events occurring without drawing a tree diagram-![Two spinners shown.
    The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]()
    You spin the above fair spinners

    What is the probability of getting a \( 1 \) on both spinners?

    ###Correct Answer###:\( \frac{1}{20} \)

    ###Misconcepted Incorrect answer###:\( \frac{1}{9} \)'
  sentences:
  - When multiplying fractions, multiplies the numerator and adds the denominator
  - Does not follow the arrows through a function machine, changes the order of the
    operations asked.
  - Believes a curve can show a constant rate
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '###Question###:Combined Events-Calculate the probability of two independent events occurring without drawing a tree diagram-![Two spinners shown. The first spinner has the numbers 1-4 and the second spinner has the number 1-5.]() You spin the above fair spinners\nWhat is the probability of getting a \\( 1 \\) on both spinners?\n###Correct Answer###:\\( \\frac{1}{20} \\)\n###Misconcepted Incorrect answer###:\\( \\frac{1}{9} \\)',
    'When multiplying fractions, multiplies the numerator and adds the denominator',
    'Does not follow the arrows through a function machine, changes the order of the operations asked.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset


* Size: 4,370 training samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                               | positive                                                                          |
  |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
  | type    | string                                                                               | string                                                                            |
  | details | <ul><li>min: 60 tokens</li><li>mean: 122.91 tokens</li><li>max: 435 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.81 tokens</li><li>max: 39 tokens</li></ul> |
* Samples:
  | anchor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | positive                                                                                                                                                                                |
  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>###Question###:Simplifying Algebraic Fractions-Simplify an algebraic fraction by factorising the numerator-Simplify the following, if possible: \( \frac{m^{2}+2 m-3}{m-3} \)<br>###Correct Answer###:Does not simplify<br>###Misconcepted Incorrect answer###:\( m+1 \)</code>                                                                                                                                                                                                                                                                                              | <code>Does not know that to factorise a quadratic expression, to find two numbers that add to give the coefficient of the x term, and multiply to give the non variable term<br></code> |
  | <code>###Question###:Range and Interquartile Range from a List of Data-Calculate the range from a list of data-Tom and Katie are discussing the \( 5 \) plants with these heights:<br>\( 24 \mathrm{~cm}, 17 \mathrm{~cm}, 42 \mathrm{~cm}, 26 \mathrm{~cm}, 13 \mathrm{~cm} \)<br>Tom says if all the plants were cut in half, the range wouldn't change.<br>Katie says if all the plants grew by \( 3 \mathrm{~cm} \) each, the range wouldn't change.<br>Who do you agree with?<br>###Correct Answer###:Only<br>Katie<br>###Misconcepted Incorrect answer###:Only<br>Tom</code> | <code>Believes if you changed all values by the same proportion the range would not change</code>                                                                                       |
  | <code>###Question###:Properties of Quadrilaterals-Recall and use the intersecting diagonals properties of a rectangle-The angles highlighted on this rectangle with different length sides can never be... ![A rectangle with the diagonals drawn in. The angle on the right hand side at the centre is highlighted in red and the angle at the bottom at the centre is highlighted in yellow.]()<br>###Correct Answer###:\( 90^{\circ} \)<br>###Misconcepted Incorrect answer###:acute</code>                                                                                     | <code>Does not know the properties of a rectangle</code>                                                                                                                                |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `num_train_epochs`: 10
- `fp16`: True
- `push_to_hub`: True
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 8
- `per_device_eval_batch_size`: 8
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 10
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: True
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss |
|:------:|:----:|:-------------:|
| 0.9141 | 500  | 0.3742        |
| 1.8282 | 1000 | 0.1576        |
| 2.7422 | 1500 | 0.0786        |
| 3.6563 | 2000 | 0.037         |
| 4.5704 | 2500 | 0.0239        |
| 5.4845 | 3000 | 0.0153        |
| 6.3985 | 3500 | 0.0087        |
| 7.3126 | 4000 | 0.0046        |
| 8.2267 | 4500 | 0.0043        |
| 9.1408 | 5000 | 0.003         |


### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.5.1+cu121
- Accelerate: 1.1.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->