Starbucks_STS / README.md
wshuai190's picture
Add new SentenceTransformer model.
be26cd5 verified
metadata
base_model: ielabgroup/bert-base-uncased-fineweb100bt-matryoshka-mae
datasets:
  - sentence-transformers/all-nli
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:557850
  - loss:StarbucksLoss
widget:
  - source_sentence: A dog is in the water.
    sentences:
      - The woman is wearing green.
      - The dog is rolling around in the grass.
      - >-
        A brown dog swims through water outdoors with a tennis ball in its
        mouth.
  - source_sentence: A dog is swimming.
    sentences:
      - a black dog swimming in the water with a tennis ball in his mouth
      - A dog with yellow fur swims, neck deep, in water.
      - A brown dog running through a large orange tube.
  - source_sentence: A dog is swimming.
    sentences:
      - A dog with golden hair swims through water.
      - A golden haired dog is lying in a boat that is traveling on a lake.
      - A dog with golden hair swims through water.
  - source_sentence: A dog is swimming.
    sentences:
      - A tan dog splashes as he swims through the water.
      - A man and young boy asleep in a chair.
      - A dog in a harness chasing a red ball.
  - source_sentence: A dog is in the water.
    sentences:
      - A big brown dog jumps into a swimming pool on the backyard.
      - Wet brown dog swims towards camera.
      - The dog is rolling around in the grass.
model-index:
  - name: >-
      SentenceTransformer based on
      ielabgroup/bert-base-uncased-fineweb100bt-matryoshka-mae
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8170317205826663
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.827406310000667
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8085162876731988
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8050045835065848
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8122787407180172
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.809299222491485
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7657571947414553
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7564706925314776
            name: Spearman Dot
          - type: pearson_max
            value: 0.8170317205826663
            name: Pearson Max
          - type: spearman_max
            value: 0.827406310000667
            name: Spearman Max

SentenceTransformer based on ielabgroup/bert-base-uncased-fineweb100bt-matryoshka-mae

This is a sentence-transformers model finetuned from ielabgroup/bert-base-uncased-fineweb100bt-matryoshka-mae on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ielabgroup/Starbucks_STS")
# Run inference
sentences = [
    'A dog is in the water.',
    'Wet brown dog swims towards camera.',
    'The dog is rolling around in the grass.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.817
spearman_cosine 0.8274
pearson_manhattan 0.8085
spearman_manhattan 0.805
pearson_euclidean 0.8123
spearman_euclidean 0.8093
pearson_dot 0.7658
spearman_dot 0.7565
pearson_max 0.817
spearman_max 0.8274

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.46 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 12.81 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 13.4 tokens
    • max: 50 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: starbucks_loss.StarbucksLoss with these parameters:
    {
        "loss": "MatryoshkaLoss",
        "n_selections_per_step": -1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_layers": [
            1,
            3,
            5,
            7,
            9,
            11
        ],
        "matryoshka_dims": [
            32,
            64,
            128,
            256,
            512,
            768
        ]
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss sts-test_spearman_cosine
0.0229 100 16.7727 -
0.0459 200 9.653 -
0.0688 300 8.3187 -
0.0918 400 7.748 -
0.1147 500 7.2587 -
0.1376 600 6.734 -
0.1606 700 6.4463 -
0.1835 800 6.299 -
0.2065 900 5.9946 -
0.2294 1000 5.9348 -
0.2524 1100 5.7723 -
0.2753 1200 5.5822 -
0.2982 1300 5.4233 -
0.3212 1400 5.3427 -
0.3441 1500 5.3132 -
0.3671 1600 5.3149 -
0.3900 1700 5.3007 -
0.4129 1800 4.9539 -
0.4359 1900 4.9308 -
0.4588 2000 4.8171 -
0.4818 2100 5.0181 -
0.5047 2200 4.9631 -
0.5276 2300 4.8125 -
0.5506 2400 4.7133 -
0.5735 2500 4.5809 -
0.5965 2600 4.6093 -
0.6194 2700 4.6723 -
0.6423 2800 4.5526 -
0.6653 2900 4.4967 -
0.6882 3000 4.4178 -
0.7112 3100 4.4333 -
0.7341 3200 4.3289 -
0.7571 3300 4.5199 -
0.7800 3400 4.3389 -
0.8029 3500 4.3394 -
0.8259 3600 4.2423 -
0.8488 3700 4.3219 -
0.8718 3800 4.3297 -
0.8947 3900 4.3132 -
0.9176 4000 4.2616 -
0.9406 4100 4.2233 -
0.9635 4200 4.1912 -
0.9865 4300 4.1838 -
1.0 4359 - 0.8274

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}