CarlosElArtista's picture
Add new SentenceTransformer model
33aeadb verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:6300
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: >-
      Chevron provides long-standing employee support programs such as Ombuds,
      an independent resource, a company hotline for reporting concerns, and the
      Employee Assistance Program, a confidential consulting service for a range
      of personal, family, and work-related concerns.
    sentences:
      - >-
        What is the effective date for the new accounting standard on equity
        securities for public entities?
      - >-
        What programs does Chevron have to support employee well-being and
        address workplace issues?
      - >-
        What type of service is provided by Walmart in Mexico to enhance digital
        connectivity?
  - source_sentence: >-
      ProConnect Tax Online is our cloud-based solution, which is designed for
      full-service, year-round practices who prepare all forms of consumer and
      small business returns and integrates with our QuickBooks Online
      offerings.
    sentences:
      - >-
        What is the significance of the Company’s trademarks to their
        businesses?
      - What are the features of Intuit's ProConnect Tax Online service?
      - >-
        Where can information regarding legal proceedings be found in the
        document?
  - source_sentence: >-
      The section titled 'Financial Wtatement and Supplementary Data' is labeled
      with the number 39 in the document.
    sentences:
      - >-
        What is the numerical label associated with the section on Financial
        Statements and Supplementary Data in the document?
      - Why did the effective tax rate increase in 2022 compared to 2021?
      - >-
        What role does intellectual property play in Nike's competitive
        position?
  - source_sentence: >-
      Our operating cash inflows include cash from vehicle sales and related
      servicing, customer lease and financing payments, customer deposits, cash
      from sales of regulatory credits and energy generation and storage
      products, and interest income on our cash and investments portfolio.
    sentences:
      - >-
        What was the net increase in cash and cash equivalents for the year
        ending December 30, 2023?
      - >-
        What are the requirements for health insurers and group health plans in
        providing cost estimates to consumers?
      - What are the sources of operating cash inflows?
  - source_sentence: >-
      Symtuza (darunavir/C/FTC/TAF), a fixed dose combination product that
      includes cobicistat ('C'), emtricitabine ('FTC'), and tenofovir
      alafenamide ('TAF'), is commercialized by Janssen Sciences Ireland
      Unlimited Company.
    sentences:
      - >-
        What are the primary drugs included in Symtuza and which company
        commercializes it?
      - >-
        What was reported as the percentage revenue increase for the Asia
        Pacific & Latin America segment of NIKE from fiscal 2022 to fiscal 2023?
      - >-
        What are the main factors influencing competition for the company's
        products?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: BGE base Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.67
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8071428571428572
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8485714285714285
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8985714285714286
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.67
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26904761904761904
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16971428571428568
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08985714285714284
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.67
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8071428571428572
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8485714285714285
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8985714285714286
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7849037198632751
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7484699546485256
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7522833636034203
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.6657142857142857
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8085714285714286
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8414285714285714
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8942857142857142
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6657142857142857
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26952380952380955
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16828571428571426
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08942857142857143
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6657142857142857
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8085714285714286
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8414285714285714
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8942857142857142
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7816751594389505
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7455107709750564
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7495566091259342
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.6528571428571428
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8042857142857143
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8357142857142857
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8957142857142857
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6528571428571428
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2680952380952381
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16714285714285712
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08957142857142857
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6528571428571428
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8042857142857143
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8357142857142857
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8957142857142857
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7751159904165151
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7365447845804987
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7402062124507567
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.6442857142857142
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7885714285714286
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.83
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8857142857142857
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6442857142857142
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26285714285714284
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16599999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08857142857142856
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6442857142857142
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7885714285714286
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.83
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8857142857142857
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7673388064771406
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7293316326530613
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7335797814707157
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6057142857142858
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.78
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8214285714285714
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8814285714285715
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6057142857142858
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16428571428571426
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08814285714285712
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6057142857142858
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.78
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8214285714285714
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8814285714285715
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7451487636214842
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7013752834467117
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7052270125234881
            name: Cosine Map@100

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CarlosElArtista/bge-base-financial-matryoshka")
# Run inference
sentences = [
    "Symtuza (darunavir/C/FTC/TAF), a fixed dose combination product that includes cobicistat ('C'), emtricitabine ('FTC'), and tenofovir alafenamide ('TAF'), is commercialized by Janssen Sciences Ireland Unlimited Company.",
    'What are the primary drugs included in Symtuza and which company commercializes it?',
    'What was reported as the percentage revenue increase for the Asia Pacific & Latin America segment of NIKE from fiscal 2022 to fiscal 2023?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_accuracy@3 0.8071 0.8086 0.8043 0.7886 0.78
cosine_accuracy@5 0.8486 0.8414 0.8357 0.83 0.8214
cosine_accuracy@10 0.8986 0.8943 0.8957 0.8857 0.8814
cosine_precision@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_precision@3 0.269 0.2695 0.2681 0.2629 0.26
cosine_precision@5 0.1697 0.1683 0.1671 0.166 0.1643
cosine_precision@10 0.0899 0.0894 0.0896 0.0886 0.0881
cosine_recall@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_recall@3 0.8071 0.8086 0.8043 0.7886 0.78
cosine_recall@5 0.8486 0.8414 0.8357 0.83 0.8214
cosine_recall@10 0.8986 0.8943 0.8957 0.8857 0.8814
cosine_ndcg@10 0.7849 0.7817 0.7751 0.7673 0.7451
cosine_mrr@10 0.7485 0.7455 0.7365 0.7293 0.7014
cosine_map@100 0.7523 0.7496 0.7402 0.7336 0.7052

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 8 tokens
    • mean: 46.05 tokens
    • max: 512 tokens
    • min: 2 tokens
    • mean: 20.55 tokens
    • max: 51 tokens
  • Samples:
    positive anchor
    The AMPTC for microinverters decreases by 25% each year beginning in 2030 and ending after 2032. What is the trajectory of the AMPTC for microinverters starting in 2030?
    results. Legal and Other Contingencies The Company is subject to various legal proceedings and claims that arise in the ordinary course of business, the outcomes of which are inherently uncertain. The Company records a liability when it is probable that a loss has been incurred and the amount is reasonably estimable, the determination of which requires significant judgment. Resolution of legal matters in a manner inconsistent with management’s expectations could have a material impact on the Company’s financial condition and operating results. Apple Inc. 2023 Form 10-K
    In 2023, the company recorded other operating charges of $1,951 million. What was the total amount of other operating charges recorded by the company in 2023?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0254 10 0.3873 - - - - -
0.0508 20 0.1907 - - - - -
0.0762 30 0.3031 - - - - -
0.1016 40 0.3314 - - - - -
0.1270 50 0.3452 - - - - -
0.1524 60 0.1831 - - - - -
0.1778 70 0.1286 - - - - -
0.2032 80 0.1162 - - - - -
0.2286 90 0.1464 - - - - -
0.2540 100 0.0409 - - - - -
0.2794 110 0.0886 - - - - -
0.3048 120 0.0964 - - - - -
0.3302 130 0.175 - - - - -
0.3556 140 0.1102 - - - - -
0.3810 150 0.0705 - - - - -
0.4063 160 0.0892 - - - - -
0.4317 170 0.1246 - - - - -
0.4571 180 0.0924 - - - - -
0.4825 190 0.05 - - - - -
0.5079 200 0.0676 - - - - -
0.5333 210 0.0746 - - - - -
0.5587 220 0.2014 - - - - -
0.5841 230 0.0568 - - - - -
0.6095 240 0.118 - - - - -
0.6349 250 0.0833 - - - - -
0.6603 260 0.1091 - - - - -
0.6857 270 0.1108 - - - - -
0.7111 280 0.1026 - - - - -
0.7365 290 0.1485 - - - - -
0.7619 300 0.0888 - - - - -
0.7873 310 0.0366 - - - - -
0.8127 320 0.0717 - - - - -
0.8381 330 0.0703 - - - - -
0.8635 340 0.0531 - - - - -
0.8889 350 0.0488 - - - - -
0.9143 360 0.0321 - - - - -
0.9397 370 0.1364 - - - - -
0.9651 380 0.2325 - - - - -
0.9905 390 0.0346 - - - - -
1.0 394 - 0.7833 0.7757 0.7692 0.7525 0.7314
1.0152 400 0.0742 - - - - -
1.0406 410 0.0147 - - - - -
1.0660 420 0.0777 - - - - -
1.0914 430 0.0353 - - - - -
1.1168 440 0.0093 - - - - -
1.1422 450 0.1484 - - - - -
1.1676 460 0.0167 - - - - -
1.1930 470 0.0039 - - - - -
1.2184 480 0.007 - - - - -
1.2438 490 0.0043 - - - - -
1.2692 500 0.0156 - - - - -
1.2946 510 0.0519 - - - - -
1.32 520 0.0163 - - - - -
1.3454 530 0.0214 - - - - -
1.3708 540 0.0025 - - - - -
1.3962 550 0.0129 - - - - -
1.4216 560 0.0045 - - - - -
1.4470 570 0.0025 - - - - -
1.4724 580 0.0023 - - - - -
1.4978 590 0.0114 - - - - -
1.5232 600 0.0636 - - - - -
1.5486 610 0.0066 - - - - -
1.5740 620 0.0112 - - - - -
1.5994 630 0.0087 - - - - -
1.6248 640 0.0026 - - - - -
1.6502 650 0.017 - - - - -
1.6756 660 0.0741 - - - - -
1.7010 670 0.0041 - - - - -
1.7263 680 0.0339 - - - - -
1.7517 690 0.003 - - - - -
1.7771 700 0.0052 - - - - -
1.8025 710 0.0464 - - - - -
1.8279 720 0.0015 - - - - -
1.8533 730 0.0169 - - - - -
1.8787 740 0.0178 - - - - -
1.9041 750 0.0033 - - - - -
1.9295 760 0.0165 - - - - -
1.9549 770 0.0091 - - - - -
1.9803 780 0.1162 - - - - -
2.0 788 - 0.7849 0.7820 0.7764 0.7661 0.7469
2.0051 790 0.0077 - - - - -
2.0305 800 0.0024 - - - - -
2.0559 810 0.0025 - - - - -
2.0813 820 0.0032 - - - - -
2.1067 830 0.0022 - - - - -
2.1321 840 0.0428 - - - - -
2.1575 850 0.0027 - - - - -
2.1829 860 0.0015 - - - - -
2.2083 870 0.0028 - - - - -
2.2337 880 0.0006 - - - - -
2.2590 890 0.0005 - - - - -
2.2844 900 0.0025 - - - - -
2.3098 910 0.002 - - - - -
2.3352 920 0.002 - - - - -
2.3606 930 0.0105 - - - - -
2.3860 940 0.0061 - - - - -
2.4114 950 0.0017 - - - - -
2.4368 960 0.0009 - - - - -
2.4622 970 0.0007 - - - - -
2.4876 980 0.001 - - - - -
2.5130 990 0.0008 - - - - -
2.5384 1000 0.044 - - - - -
2.5638 1010 0.0012 - - - - -
2.5892 1020 0.0103 - - - - -
2.6146 1030 0.0003 - - - - -
2.64 1040 0.0005 - - - - -
2.6654 1050 0.0972 - - - - -
2.6908 1060 0.0011 - - - - -
2.7162 1070 0.0093 - - - - -
2.7416 1080 0.0028 - - - - -
2.7670 1090 0.0004 - - - - -
2.7924 1100 0.0231 - - - - -
2.8178 1110 0.0021 - - - - -
2.8432 1120 0.0013 - - - - -
2.8686 1130 0.0012 - - - - -
2.8940 1140 0.002 - - - - -
2.9194 1150 0.001 - - - - -
2.9448 1160 0.007 - - - - -
2.9702 1170 0.018 - - - - -
2.9956 1180 0.001 - - - - -
3.0 1182 - 0.7832 0.7823 0.7754 0.7682 0.744
3.0203 1190 0.0028 - - - - -
3.0457 1200 0.0005 - - - - -
3.0711 1210 0.0007 - - - - -
3.0965 1220 0.0008 - - - - -
3.1219 1230 0.0123 - - - - -
3.1473 1240 0.0014 - - - - -
3.1727 1250 0.0005 - - - - -
3.1981 1260 0.0003 - - - - -
3.2235 1270 0.0006 - - - - -
3.2489 1280 0.0004 - - - - -
3.2743 1290 0.0007 - - - - -
3.2997 1300 0.0011 - - - - -
3.3251 1310 0.0006 - - - - -
3.3505 1320 0.0019 - - - - -
3.3759 1330 0.0006 - - - - -
3.4013 1340 0.0011 - - - - -
3.4267 1350 0.0006 - - - - -
3.4521 1360 0.0006 - - - - -
3.4775 1370 0.0004 - - - - -
3.5029 1380 0.0007 - - - - -
3.5283 1390 0.0383 - - - - -
3.5537 1400 0.0007 - - - - -
3.5790 1410 0.0019 - - - - -
3.6044 1420 0.0038 - - - - -
3.6298 1430 0.0007 - - - - -
3.6552 1440 0.0463 - - - - -
3.6806 1450 0.0373 - - - - -
3.7060 1460 0.0007 - - - - -
3.7314 1470 0.0022 - - - - -
3.7568 1480 0.0005 - - - - -
3.7822 1490 0.0007 - - - - -
3.8076 1500 0.0177 - - - - -
3.8330 1510 0.0006 - - - - -
3.8584 1520 0.0009 - - - - -
3.8838 1530 0.0012 - - - - -
3.9092 1540 0.0009 - - - - -
3.9346 1550 0.0012 - - - - -
3.96 1560 0.0004 - - - - -
3.9854 1570 0.0064 - - - - -
3.9905 1572 - 0.7849 0.7817 0.7751 0.7673 0.7451
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}