SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'what is my exposure to US Equities?',
    '[{"get_portfolio(None,None)": "portfolio"}, {"factor_contribution(\'portfolio\',\'<DATES>\',\'asset_class\',\'us equity\',\'portfolio\')": "portfolio"}]',
    '[{"get_portfolio(None,None)": "portfolio"}, {"factor_contribution(\'portfolio\',\'<DATES>\',\'sector\',\'sector industrials\',\'portfolio\')": "portfolio"}]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6781
cosine_accuracy@3 0.8082
cosine_accuracy@5 0.863
cosine_accuracy@10 0.9315
cosine_precision@1 0.6781
cosine_precision@3 0.2694
cosine_precision@5 0.1726
cosine_precision@10 0.0932
cosine_recall@1 0.0188
cosine_recall@3 0.0225
cosine_recall@5 0.024
cosine_recall@10 0.0259
cosine_ndcg@10 0.176
cosine_mrr@10 0.7579
cosine_map@100 0.0211

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,090 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 5 tokens
    • mean: 13.28 tokens
    • max: 27 tokens
    • min: 26 tokens
    • mean: 87.73 tokens
    • max: 196 tokens
  • Samples:
    sentence_0 sentence_1
    what is my portfolio [DATES] cagr? [{"get_portfolio(None,None)": "portfolio"}, {"get_attribute('portfolio',['gains'],'')": "portfolio"}, {"sort('portfolio','gains','desc')": "portfolio"}]
    what is my [DATES] rate of return [{"get_portfolio(None,None)": "portfolio"}, {"get_attribute('portfolio',['gains'],'')": "portfolio"}, {"sort('portfolio','gains','desc')": "portfolio"}]
    show backtest of my performance [DATES]? [{"get_portfolio(None,None)": "portfolio"}, {"get_attribute('portfolio',['gains'],'')": "portfolio"}, {"sort('portfolio','gains','desc')": "portfolio"}]
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 6
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step cosine_ndcg@10
0.0183 2 0.1179
0.0367 4 0.1184
0.0550 6 0.1193
0.0734 8 0.1201
0.0917 10 0.1227
0.1101 12 0.1235
0.1284 14 0.1255
0.1468 16 0.1267
0.1651 18 0.1299
0.1835 20 0.1320
0.2018 22 0.1348
0.2202 24 0.1367
0.2385 26 0.1383
0.2569 28 0.1413
0.2752 30 0.1420
0.2936 32 0.1432
0.3119 34 0.1435
0.3303 36 0.1451
0.3486 38 0.1471
0.3670 40 0.1491
0.3853 42 0.1503
0.4037 44 0.1523
0.4220 46 0.1525
0.4404 48 0.1531
0.4587 50 0.1535
0.4771 52 0.1534
0.4954 54 0.1529
0.5138 56 0.1528
0.5321 58 0.1556
0.5505 60 0.1568
0.5688 62 0.1576
0.5872 64 0.1577
0.6055 66 0.1577
0.6239 68 0.1575
0.6422 70 0.1586
0.6606 72 0.1596
0.6789 74 0.1612
0.6972 76 0.1617
0.7156 78 0.1637
0.7339 80 0.1638
0.7523 82 0.1637
0.7706 84 0.1635
0.7890 86 0.1634
0.8073 88 0.1640
0.8257 90 0.1641
0.8440 92 0.1652
0.8624 94 0.1652
0.8807 96 0.1657
0.8991 98 0.1650
0.9174 100 0.1664
0.9358 102 0.1668
0.9541 104 0.1671
0.9725 106 0.1683
0.9908 108 0.1689
1.0 109 0.1684
1.0092 110 0.1673
1.0275 112 0.1686
1.0459 114 0.1680
1.0642 116 0.1676
1.0826 118 0.1668
1.1009 120 0.1668
1.1193 122 0.1671
1.1376 124 0.1673
1.1560 126 0.1666
1.1743 128 0.1669
1.1927 130 0.1668
1.2110 132 0.1669
1.2294 134 0.1673
1.2477 136 0.1681
1.2661 138 0.1683
1.2844 140 0.1681
1.3028 142 0.1674
1.3211 144 0.1672
1.3394 146 0.1668
1.3578 148 0.1682
1.3761 150 0.1689
1.3945 152 0.1690
1.4128 154 0.1693
1.4312 156 0.1683
1.4495 158 0.1683
1.4679 160 0.1678
1.4862 162 0.1695
1.5046 164 0.1710
1.5229 166 0.1717
1.5413 168 0.1715
1.5596 170 0.1698
1.5780 172 0.1699
1.5963 174 0.1694
1.6147 176 0.1701
1.6330 178 0.1693
1.6514 180 0.1683
1.6697 182 0.1692
1.6881 184 0.1689
1.7064 186 0.1696
1.7248 188 0.1696
1.7431 190 0.1700
1.7615 192 0.1705
1.7798 194 0.1718
1.7982 196 0.1719
1.8165 198 0.1723
1.8349 200 0.1721
1.8532 202 0.1717
1.8716 204 0.1722
1.8899 206 0.1722
1.9083 208 0.1728
1.9266 210 0.1734
1.9450 212 0.1733
1.9633 214 0.1742
1.9817 216 0.1749
2.0 218 0.1750
2.0183 220 0.1760

Framework Versions

  • Python: 3.10.9
  • Sentence Transformers: 3.3.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
58
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for magnifi/bge-small-en-v1.5-ft-orc-0930-dates

Finetuned
(139)
this model

Evaluation results