SentenceTransformer based on dbourget/pb-small-10e-tsdae6e-philsim-cosine-3e-pt1

This is a sentence-transformers model finetuned from dbourget/pb-small-10e-tsdae6e-philsim-cosine-3e-pt1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-50e")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.7929
dot_accuracy 0.2542
manhattan_accuracy 0.8022
euclidean_accuracy 0.8013
max_accuracy 0.8022

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 5e-07
  • weight_decay: 0.01
  • num_train_epochs: 50
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-07
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss beatai-dev_cosine_accuracy
0 0 - - 0.4764
0.1471 10 0.2061 - -
0.2941 20 0.2048 - -
0.4412 30 0.204 - -
0.5882 40 0.202 - -
0.7353 50 0.2019 0.2010 0.5219
0.8824 60 0.2017 - -
1.0294 70 0.1954 - -
1.1765 80 0.1959 - -
1.3235 90 0.1941 - -
1.4706 100 0.1937 0.1929 0.5598
1.6176 110 0.1923 - -
1.7647 120 0.1893 - -
1.9118 130 0.1861 - -
2.0588 140 0.1842 - -
2.2059 150 0.1818 0.1814 0.5985
2.3529 160 0.1834 - -
2.5 170 0.1729 - -
2.6471 180 0.1726 - -
2.7941 190 0.1668 - -
2.9412 200 0.1622 0.1653 0.6330
3.0882 210 0.1604 - -
3.2353 220 0.1572 - -
3.3824 230 0.159 - -
3.5294 240 0.1567 - -
3.6765 250 0.1481 0.1562 0.6532
3.8235 260 0.148 - -
3.9706 270 0.1492 - -
4.1176 280 0.1528 - -
4.2647 290 0.1437 - -
4.4118 300 0.1481 0.1490 0.6658
4.5588 310 0.1386 - -
4.7059 320 0.1413 - -
4.8529 330 0.1407 - -
5.0 340 0.1387 - -
5.1471 350 0.1423 0.1438 0.6717
5.2941 360 0.1376 - -
5.4412 370 0.1314 - -
5.5882 380 0.1416 - -
5.7353 390 0.1284 - -
5.8824 400 0.1375 0.1394 0.6801
6.0294 410 0.1308 - -
6.1765 420 0.1286 - -
6.3235 430 0.1326 - -
6.4706 440 0.1356 - -
6.6176 450 0.1298 0.1361 0.6877
6.7647 460 0.1242 - -
6.9118 470 0.1299 - -
7.0588 480 0.1279 - -
7.2059 490 0.1234 - -
7.3529 500 0.1298 0.1333 0.7045
7.5 510 0.1252 - -
7.6471 520 0.1248 - -
7.7941 530 0.1241 - -
7.9412 540 0.126 - -
8.0882 550 0.1252 0.1316 0.7071
8.2353 560 0.1237 - -
8.3824 570 0.1205 - -
8.5294 580 0.1195 - -
8.6765 590 0.1187 - -
8.8235 600 0.1187 0.1293 0.7138
8.9706 610 0.1269 - -
9.1176 620 0.1261 - -
9.2647 630 0.1182 - -
9.4118 640 0.1219 - -
9.5588 650 0.1173 0.1276 0.7172
9.7059 660 0.1182 - -
9.8529 670 0.122 - -
10.0 680 0.1179 - -
10.1471 690 0.1137 - -
10.2941 700 0.1248 0.1261 0.7247
10.4412 710 0.1162 - -
10.5882 720 0.1166 - -
10.7353 730 0.1111 - -
10.8824 740 0.115 - -
11.0294 750 0.1175 0.1247 0.7298
11.1765 760 0.1136 - -
11.3235 770 0.1172 - -
11.4706 780 0.1158 - -
11.6176 790 0.1142 - -
11.7647 800 0.1097 0.1236 0.7332
11.9118 810 0.1161 - -
12.0588 820 0.1153 - -
12.2059 830 0.1114 - -
12.3529 840 0.1133 - -
12.5 850 0.1104 0.1226 0.7332
12.6471 860 0.1093 - -
12.7941 870 0.1157 - -
12.9412 880 0.1127 - -
13.0882 890 0.1115 - -
13.2353 900 0.1109 0.1214 0.7323
13.3824 910 0.1125 - -
13.5294 920 0.1097 - -
13.6765 930 0.1124 - -
13.8235 940 0.114 - -
13.9706 950 0.11 0.1204 0.7382
14.1176 960 0.1049 - -
14.2647 970 0.1128 - -
14.4118 980 0.1109 - -
14.5588 990 0.1087 - -
14.7059 1000 0.1079 0.1196 0.7382
14.8529 1010 0.1077 - -
15.0 1020 0.1061 - -
15.1471 1030 0.1101 - -
15.2941 1040 0.1087 - -
15.4412 1050 0.106 0.1186 0.7399
15.5882 1060 0.1047 - -
15.7353 1070 0.1048 - -
15.8824 1080 0.103 - -
16.0294 1090 0.1064 - -
16.1765 1100 0.1029 0.1179 0.7433
16.3235 1110 0.1033 - -
16.4706 1120 0.1066 - -
16.6176 1130 0.1095 - -
16.7647 1140 0.1031 - -
16.9118 1150 0.1 0.1172 0.7466
17.0588 1160 0.1056 - -
17.2059 1170 0.1033 - -
17.3529 1180 0.102 - -
17.5 1190 0.1083 - -
17.6471 1200 0.0971 0.1164 0.7458
17.7941 1210 0.1016 - -
17.9412 1220 0.1033 - -
18.0882 1230 0.0987 - -
18.2353 1240 0.1062 - -
18.3824 1250 0.0925 0.1157 0.7475
18.5294 1260 0.1028 - -
18.6765 1270 0.1012 - -
18.8235 1280 0.1027 - -
18.9706 1290 0.1026 - -
19.1176 1300 0.1023 0.1148 0.7508
19.2647 1310 0.1053 - -
19.4118 1320 0.0981 - -
19.5588 1330 0.0975 - -
19.7059 1340 0.1006 - -
19.8529 1350 0.0991 0.1141 0.7508
20.0 1360 0.0994 - -
20.1471 1370 0.0998 - -
20.2941 1380 0.1014 - -
20.4412 1390 0.0986 - -
20.5882 1400 0.098 0.1133 0.7525
20.7353 1410 0.101 - -
20.8824 1420 0.098 - -
21.0294 1430 0.1041 - -
21.1765 1440 0.0979 - -
21.3235 1450 0.1006 0.1126 0.7559
21.4706 1460 0.097 - -
21.6176 1470 0.0985 - -
21.7647 1480 0.0956 - -
21.9118 1490 0.0993 - -
22.0588 1500 0.0943 0.1120 0.7551
22.2059 1510 0.0977 - -
22.3529 1520 0.0998 - -
22.5 1530 0.0977 - -
22.6471 1540 0.099 - -
22.7941 1550 0.0925 0.1113 0.7576
22.9412 1560 0.0929 - -
23.0882 1570 0.0965 - -
23.2353 1580 0.0896 - -
23.3824 1590 0.0993 - -
23.5294 1600 0.0941 0.1109 0.7576
23.6765 1610 0.0927 - -
23.8235 1620 0.0994 - -
23.9706 1630 0.0956 - -
24.1176 1640 0.0947 - -
24.2647 1650 0.0927 0.1103 0.7576
24.4118 1660 0.0935 - -
24.5588 1670 0.0996 - -
24.7059 1680 0.0903 - -
24.8529 1690 0.0916 - -
25.0 1700 0.0951 0.1096 0.7584
25.1471 1710 0.0924 - -
25.2941 1720 0.0952 - -
25.4412 1730 0.0954 - -
25.5882 1740 0.0968 - -
25.7353 1750 0.0942 0.1090 0.7593
25.8824 1760 0.0913 - -
26.0294 1770 0.0931 - -
26.1765 1780 0.0872 - -
26.3235 1790 0.0915 - -
26.4706 1800 0.0937 0.1085 0.7601
26.6176 1810 0.0971 - -
26.7647 1820 0.0944 - -
26.9118 1830 0.0908 - -
27.0588 1840 0.089 - -
27.2059 1850 0.0944 0.1082 0.7626
27.3529 1860 0.0926 - -
27.5 1870 0.087 - -
27.6471 1880 0.0904 - -
27.7941 1890 0.0886 - -
27.9412 1900 0.0942 0.1077 0.7635
28.0882 1910 0.0947 - -
28.2353 1920 0.0857 - -
28.3824 1930 0.0908 - -
28.5294 1940 0.0943 - -
28.6765 1950 0.0902 0.1071 0.7668
28.8235 1960 0.0909 - -
28.9706 1970 0.0897 - -
29.1176 1980 0.0924 - -
29.2647 1990 0.0909 - -
29.4118 2000 0.0895 0.1066 0.7652
29.5588 2010 0.0832 - -
29.7059 2020 0.0883 - -
29.8529 2030 0.0935 - -
30.0 2040 0.09 - -
30.1471 2050 0.0891 0.1060 0.7677
30.2941 2060 0.0978 - -
30.4412 2070 0.0894 - -
30.5882 2080 0.0893 - -
30.7353 2090 0.0815 - -
30.8824 2100 0.0889 0.1058 0.7660
31.0294 2110 0.0801 - -
31.1765 2120 0.0922 - -
31.3235 2130 0.0868 - -
31.4706 2140 0.0858 - -
31.6176 2150 0.0862 0.1055 0.7685
31.7647 2160 0.0861 - -
31.9118 2170 0.0896 - -
32.0588 2180 0.0877 - -
32.2059 2190 0.0864 - -
32.3529 2200 0.0921 0.1050 0.7694
32.5 2210 0.082 - -
32.6471 2220 0.0902 - -
32.7941 2230 0.0825 - -
32.9412 2240 0.0829 - -
33.0882 2250 0.0859 0.1046 0.7694
33.2353 2260 0.0847 - -
33.3824 2270 0.0829 - -
33.5294 2280 0.0841 - -
33.6765 2290 0.0833 - -
33.8235 2300 0.0899 0.1042 0.7710
33.9706 2310 0.0789 - -
34.1176 2320 0.0809 - -
34.2647 2330 0.0835 - -
34.4118 2340 0.0816 - -
34.5588 2350 0.0803 0.1038 0.7744
34.7059 2360 0.0808 - -
34.8529 2370 0.0867 - -
35.0 2380 0.0878 - -
35.1471 2390 0.0869 - -
35.2941 2400 0.0785 0.1034 0.7753
35.4412 2410 0.0849 - -
35.5882 2420 0.0832 - -
35.7353 2430 0.0799 - -
35.8824 2440 0.0813 - -
36.0294 2450 0.0801 0.1029 0.7753
36.1765 2460 0.0771 - -
36.3235 2470 0.0828 - -
36.4706 2480 0.0837 - -
36.6176 2490 0.0774 - -
36.7647 2500 0.0822 0.1026 0.7769
36.9118 2510 0.0845 - -
37.0588 2520 0.0882 - -
37.2059 2530 0.0802 - -
37.3529 2540 0.0806 - -
37.5 2550 0.0809 0.1022 0.7795
37.6471 2560 0.0806 - -
37.7941 2570 0.0788 - -
37.9412 2580 0.0858 - -
38.0882 2590 0.0791 - -
38.2353 2600 0.0842 0.1018 0.7795
38.3824 2610 0.0799 - -
38.5294 2620 0.0769 - -
38.6765 2630 0.0823 - -
38.8235 2640 0.0784 - -
38.9706 2650 0.0863 0.1016 0.7795
39.1176 2660 0.0751 - -
39.2647 2670 0.0847 - -
39.4118 2680 0.0784 - -
39.5588 2690 0.0799 - -
39.7059 2700 0.0771 0.1013 0.7811
39.8529 2710 0.0763 - -
40.0 2720 0.0783 - -
40.1471 2730 0.0784 - -
40.2941 2740 0.0761 - -
40.4412 2750 0.0797 0.1011 0.7837
40.5882 2760 0.0809 - -
40.7353 2770 0.0758 - -
40.8824 2780 0.0777 - -
41.0294 2790 0.0777 - -
41.1765 2800 0.0806 0.1006 0.7786
41.3235 2810 0.0852 - -
41.4706 2820 0.079 - -
41.6176 2830 0.0749 - -
41.7647 2840 0.0805 - -
41.9118 2850 0.0779 0.1003 0.7854
42.0588 2860 0.0759 - -
42.2059 2870 0.0794 - -
42.3529 2880 0.0811 - -
42.5 2890 0.0772 - -
42.6471 2900 0.0757 0.1001 0.7828
42.7941 2910 0.0781 - -
42.9412 2920 0.0751 - -
43.0882 2930 0.0752 - -
43.2353 2940 0.079 - -
43.3824 2950 0.076 0.0997 0.7811
43.5294 2960 0.0783 - -
43.6765 2970 0.0774 - -
43.8235 2980 0.07 - -
43.9706 2990 0.073 - -
44.1176 3000 0.0762 0.0993 0.7854
44.2647 3010 0.0749 - -
44.4118 3020 0.0782 - -
44.5588 3030 0.0764 - -
44.7059 3040 0.0759 - -
44.8529 3050 0.0769 0.0991 0.7887
45.0 3060 0.0754 - -
45.1471 3070 0.0744 - -
45.2941 3080 0.0767 - -
45.4412 3090 0.0724 - -
45.5882 3100 0.0742 0.0989 0.7870
45.7353 3110 0.0745 - -
45.8824 3120 0.076 - -
46.0294 3130 0.0666 - -
46.1765 3140 0.0801 - -
46.3235 3150 0.0734 0.0985 0.7887
46.4706 3160 0.0703 - -
46.6176 3170 0.0772 - -
46.7647 3180 0.0763 - -
46.9118 3190 0.0718 - -
47.0588 3200 0.0724 0.0981 0.7904
47.2059 3210 0.0755 - -
47.3529 3220 0.0719 - -
47.5 3230 0.0742 - -
47.6471 3240 0.074 - -
47.7941 3250 0.0758 0.0980 0.7921
47.9412 3260 0.0727 - -
48.0882 3270 0.0676 - -
48.2353 3280 0.0791 - -
48.3824 3290 0.0751 - -
48.5294 3300 0.075 0.0977 0.7887
48.6765 3310 0.0738 - -
48.8235 3320 0.0689 - -
48.9706 3330 0.0706 - -
49.1176 3340 0.0671 - -
49.2647 3350 0.0744 0.0974 0.7971
49.4118 3360 0.0739 - -
49.5588 3370 0.0721 - -
49.7059 3380 0.073 - -
49.8529 3390 0.0707 - -
50.0 3400 0.0689 0.0972 0.7929

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.1
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
26
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-cosine-50e

Finetuned
(1)
this model
Finetunes
1 model

Evaluation results