BERT base uncased trained on GooAQ triplets

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the sentence-transformers/gooaq dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: PeftModelForFeatureExtraction 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/bert-base-uncased-gooaq-peft")
# Run inference
sentences = [
    'what health services are covered by medicare?',
    'Medicare Part A hospital insurance covers inpatient hospital care, skilled nursing facility, hospice, lab tests, surgery, home health care.',
    "Elephants have the longest gestation period of all mammals. These gentle giants' pregnancies last for more than a year and a half. The average gestation period of an elephant is about 640 to 660 days, or roughly 95 weeks.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.576
cosine_accuracy@3 0.7295
cosine_accuracy@5 0.7824
cosine_accuracy@10 0.8462
cosine_precision@1 0.576
cosine_precision@3 0.2432
cosine_precision@5 0.1565
cosine_precision@10 0.0846
cosine_recall@1 0.576
cosine_recall@3 0.7295
cosine_recall@5 0.7824
cosine_recall@10 0.8462
cosine_ndcg@10 0.7089
cosine_mrr@10 0.6653
cosine_map@100 0.6709
dot_accuracy@1 0.5263
dot_accuracy@3 0.6922
dot_accuracy@5 0.7494
dot_accuracy@10 0.8175
dot_precision@1 0.5263
dot_precision@3 0.2307
dot_precision@5 0.1499
dot_precision@10 0.0818
dot_recall@1 0.5263
dot_recall@3 0.6922
dot_recall@5 0.7494
dot_recall@10 0.8175
dot_ndcg@10 0.6697
dot_mrr@10 0.6226
dot_map@100 0.6291

Training Details

Training Dataset

sentence-transformers/gooaq

  • Dataset: sentence-transformers/gooaq at b089f72
  • Size: 3,002,496 training samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 8 tokens
    • mean: 11.84 tokens
    • max: 31 tokens
    • min: 13 tokens
    • mean: 60.69 tokens
    • max: 149 tokens
  • Samples:
    question answer
    can dogs get pregnant when on their period? 2. Female dogs can only get pregnant when they're in heat. Some females will show physical signs of readiness – their discharge will lighten in color, and they will “flag,” or lift their tail up and to the side.
    are there different forms of als? ['Sporadic ALS is the most common form. It affects up to 95% of people with the disease. Sporadic means it happens sometimes without a clear cause.', 'Familial ALS (FALS) runs in families. About 5% to 10% of people with ALS have this type. FALS is caused by changes to a gene.']
    what is the difference between stayman and jacoby transfer? 1. The Stayman Convention is used only with a 4-Card Major suit looking for a 4-Card Major suit fit. Jacoby Transfer bids are used with a 5-Card suit looking for a 3-Card fit.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/gooaq

  • Dataset: sentence-transformers/gooaq at b089f72
  • Size: 10,000 evaluation samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 8 tokens
    • mean: 12.01 tokens
    • max: 28 tokens
    • min: 19 tokens
    • mean: 61.37 tokens
    • max: 138 tokens
  • Samples:
    question answer
    is there a season 5 animal kingdom? the good news for the fans is that the season five was confirmed by TNT in July, 2019. The season five of Animal Kingdom was expected to release in May, 2020.
    what are cmos voltage levels? CMOS gate circuits have input and output signal specifications that are quite different from TTL. For a CMOS gate operating at a power supply voltage of 5 volts, the acceptable input signal voltages range from 0 volts to 1.5 volts for a “low” logic state, and 3.5 volts to 5 volts for a “high” logic state.
    dangers of drinking coke when pregnant? Drinking it during pregnancy was linked to poorer fine motor, visual, spatial and visual motor abilities in early childhood (around age 3). By mid-childhood (age 7), kids whose moms drank diet sodas while pregnant had poorer verbal abilities, the study findings reported.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss gooaq-dev_cosine_map@100
0 0 - - 0.2017
0.0000 1 2.584 - -
0.0213 500 2.4164 - -
0.0426 1000 1.1421 - -
0.0639 1500 0.5215 - -
0.0853 2000 0.3645 0.2763 0.6087
0.1066 2500 0.3046 - -
0.1279 3000 0.2782 - -
0.1492 3500 0.2601 - -
0.1705 4000 0.2457 0.2013 0.6396
0.1918 4500 0.2363 - -
0.2132 5000 0.2291 - -
0.2345 5500 0.2217 - -
0.2558 6000 0.2137 0.1770 0.6521
0.2771 6500 0.215 - -
0.2984 7000 0.2057 - -
0.3197 7500 0.198 - -
0.3410 8000 0.196 0.1626 0.6594
0.3624 8500 0.1938 - -
0.3837 9000 0.195 - -
0.4050 9500 0.1895 - -
0.4263 10000 0.186 0.1542 0.6628
0.4476 10500 0.1886 - -
0.4689 11000 0.1835 - -
0.4903 11500 0.1825 - -
0.5116 12000 0.1804 0.1484 0.6638
0.5329 12500 0.176 - -
0.5542 13000 0.1825 - -
0.5755 13500 0.1785 - -
0.5968 14000 0.1766 0.1436 0.6672
0.6182 14500 0.1718 - -
0.6395 15000 0.1717 - -
0.6608 15500 0.1674 - -
0.6821 16000 0.1691 0.1406 0.6704
0.7034 16500 0.1705 - -
0.7247 17000 0.1693 - -
0.7460 17500 0.166 - -
0.7674 18000 0.1676 0.1385 0.6721
0.7887 18500 0.1666 - -
0.8100 19000 0.1658 - -
0.8313 19500 0.1682 - -
0.8526 20000 0.1639 0.1370 0.6705
0.8739 20500 0.1711 - -
0.8953 21000 0.1667 - -
0.9166 21500 0.165 - -
0.9379 22000 0.1658 0.1356 0.6711
0.9592 22500 0.1665 - -
0.9805 23000 0.1636 - -
1.0 23457 - - 0.6709

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 1.051 kWh
  • Carbon Emitted: 0.409 kg of CO2
  • Hours Used: 2.832 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.1.0.dev0
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tomaarsen/bert-base-uncased-gooaq-peft-og

Finetuned
(2311)
this model

Dataset used to train tomaarsen/bert-base-uncased-gooaq-peft-og

Evaluation results