s2593817's picture
Add new SentenceTransformer model.
82ea4a3 verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:9306
  - loss:CoSENTLoss
widget:
  - source_sentence: >-
      What are the name, population, and life expectancy of the largest Asian
      country by land?
    sentences:
      - >-
        Find the names and phone numbers of customers living in California
        state.
      - What is the age of the doctor named Zach?
      - What are the name and location of the cinema with the largest capacity?
  - source_sentence: What are the titles of the cartoons sorted alphabetically?
    sentences:
      - What are the names of wines, sorted in alphabetical order?
      - >-
        Find the first and last names of people who payed more than the rooms'
        base prices.
      - What is the name of the track that has had the greatest number of races?
  - source_sentence: >-
      What is the name of each continent and how many car makers are there in
      each one?
    sentences:
      - >-
        What are the allergy types and how many allergies correspond to each
        one?
      - >-
        List all people names in the order of their date of birth from old to
        young.
      - Which city has the most customers living in?
  - source_sentence: Give the flight numbers of flights arriving in Aberdeen.
    sentences:
      - >-
        Return the device carriers that do not have Android as their software
        platform.
      - >-
        What are the names of the pilots that have not won any matches in
        Australia?
      - Give the phones for departments in room 268.
  - source_sentence: How many total tours were there for each ranking date?
    sentences:
      - What is the carrier of the most expensive phone?
      - >-
        How many total pounds were purchased in the year 2018 at all London
        branches?
      - >-
        Find the number of students for the cities where have more than one
        student.

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("s2593817/sft-question-embedding")
# Run inference
sentences = [
    'How many total tours were there for each ranking date?',
    'How many total pounds were purchased in the year 2018 at all London branches?',
    'What is the carrier of the most expensive phone?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,306 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string int
    details
    • min: 7 tokens
    • mean: 16.25 tokens
    • max: 36 tokens
    • min: 7 tokens
    • mean: 15.23 tokens
    • max: 35 tokens
    • -1: ~25.20%
    • 1: ~74.80%
  • Samples:
    sentence1 sentence2 score
    How many singers do we have? How many aircrafts do we have? 1
    What is the total number of singers? What is the total number of students? 1
    Show name, country, age for all singers ordered by age from the oldest to the youngest. List all people names in the order of their date of birth from old to young. 1
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 160
  • learning_rate: 2e-05
  • num_train_epochs: 100
  • warmup_ratio: 0.2
  • fp16: True
  • dataloader_num_workers: 16
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 160
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 16
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
1.6949 100 9.4942
2.4407 200 8.3205
3.1864 300 6.3257
3.9322 400 4.7354
4.6780 500 3.6898
5.4237 600 3.3736
6.1695 700 3.0906
7.8644 800 3.1459
8.6102 900 3.4447
9.3559 1000 3.219
10.1017 1100 2.9808
10.8475 1200 2.505
11.5932 1300 2.0372
12.3390 1400 1.8879
13.0847 1500 1.8852
14.7797 1600 2.1867
15.5254 1700 2.0583
16.2712 1800 2.0132
17.0169 1900 1.8906
17.7627 2000 1.4556
18.5085 2100 1.2575
19.2542 2200 1.258
20.9492 2300 0.9423
21.6949 2400 1.398
22.4407 2500 1.2811
23.1864 2600 1.2602
23.9322 2700 1.2178
24.6780 2800 1.0895
25.4237 2900 0.9186
26.1695 3000 0.7916
27.8644 3100 0.7777
28.6102 3200 1.0487
29.3559 3300 0.9255
30.1017 3400 0.9655
30.8475 3500 0.897
31.5932 3600 0.7444
32.3390 3700 0.6445
33.0847 3800 0.5025
34.7797 3900 0.681
35.5254 4000 0.9227
36.2712 4100 0.8631
37.0169 4200 0.8573
37.7627 4300 0.9496
38.5085 4400 0.7243
39.2542 4500 0.7024
40.9492 4600 0.4793
41.6949 4700 0.8076
42.4407 4800 0.825
43.1864 4900 0.7553
43.9322 5000 0.6861
44.6780 5100 0.6589
45.4237 5200 0.5023
46.1695 5300 0.4013
47.8644 5400 0.4524
48.6102 5500 0.5891
49.3559 5600 0.5765
50.1017 5700 0.5708
50.8475 5800 0.479
51.5932 5900 0.4671

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}