SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seongil-dn/bge-m3-kor-retrieval-bs16-checkpoint-283")
# Run inference
sentences = [
    '전남지역의 석유와 화학제품은 왜 수출이 늘어나는 경향을 보였어',
    '(2) 전남지역\n2013년중 전남지역 수출은 전년대비 1.2% 감소로 전환하였다. 품목별로는 석유(+9.3% → +3.8%) 및 화학제품(+1.2% → +7.1%)이 중국 등 해외수요확대로 증가세를 지속하였으나 철강금속(+1.8% → -8.6%)은 글로벌 공급과잉 및 중국의 저가 철강수출 확대로, 선박(+7.6% → -49.2%)은 수주물량이 급격히 줄어들면서 감소로 전환하였다. 전남지역 수입은 원유, 화학제품, 철강금속 등의 수입이 줄면서 전년대비 7.4% 감소로 전환하였다.',
    '수출 증가세 지속\n1/4분기 중 수출은 전년동기대비 증가흐름을 지속하였다. 품목별로 보면 석유제품, 석유화학, 철강, 선박, 반도체, 자동차 등 대다수 품목에서 증가하였다. 석유제품은 글로벌 경기회복에 따른 에너지 수요 증가와 국제유가 급등으로 수출단가가 높은 상승세를 지속하면서 증가하였다. 석유화학도 중국, 아세안을 중심으로 합성수지, 고무 등의 수출이 큰 폭 증가한 데다 고유가로 인한 수출가격도 동반 상승하면서 증가세를 이어갔다. 철강은 건설, 조선 등 글로벌 전방산업의 수요 증대, 원자재가격 상승 및 중국 감산 등에 따른 수출단가 상승 등에 힘입어 증가세를 이어갔다. 선박은 1/4분기 중 인도물량이 확대됨에 따라 증가하였다. 반도체는 자동차 등 전방산업의 견조한 수요가 이어지는 가운데 전년동기대비로 높은 단가가 지속되면서 증가하였다. 자동차는 차량용 반도체 수급차질이 지속되었음에도 불구하고 글로벌 경기회복 흐름에 따라 수요가 늘어나면서 전년동기대비 소폭 증가하였다. 모니터링 결과 향후 수출은 증가세가 지속될 것으로 전망되었다. 석유화학 및 석유정제는 수출단가 상승과 전방산업의 수요확대 기조가 이어지면서 증가할 전망이다. 철강은 주요국 경기회복과 중국, 인도 등의 인프라 투자 확대 등으로 양호한 흐름을 이어갈 전망이다. 반도체는 글로벌 스마트폰 수요 회복, 디지털 전환 기조 등으로 견조한 증가세를 지속할 것으로 보인다. 자동차는 차량용 반도체 공급차질이 점차 완화되고 미국, 신흥시장을 중심으로 수요회복이 본격화됨에 따라 소폭 증가할 전망이다. 선박은 친환경 선박수요 지속, 글로별 교역 신장 등에도 불구하고 2021년 2/4분기 집중되었던 인도물량의 기저효과로 인해 감소할 것으로 보인다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 4
  • learning_rate: 3e-05
  • warmup_ratio: 0.05
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0011 1 3.7042
0.0021 2 4.4098
0.0032 3 4.5599
0.0042 4 4.5564
0.0053 5 5.3164
0.0064 6 4.9723
0.0074 7 5.2419
0.0085 8 3.6708
0.0095 9 3.4174
0.0106 10 3.7081
0.0117 11 3.5893
0.0127 12 2.8265
0.0138 13 1.8535
0.0149 14 2.2631
0.0159 15 1.6212
0.0170 16 1.3256
0.0180 17 3.1196
0.0191 18 2.6933
0.0202 19 2.7525
0.0212 20 1.8354
0.0223 21 1.5399
0.0233 22 1.2657
0.0244 23 1.5086
0.0255 24 1.4753
0.0265 25 1.4019
0.0276 26 1.0282
0.0286 27 1.1981
0.0297 28 1.1639
0.0308 29 1.064
0.0318 30 1.1106
0.0329 31 0.8862
0.0339 32 0.9067
0.0350 33 1.0234
0.0361 34 1.0057
0.0371 35 0.7404
0.0382 36 0.5796
0.0392 37 0.6
0.0403 38 0.6473
0.0414 39 0.7274
0.0424 40 0.5312
0.0435 41 0.6884
0.0446 42 0.4993
0.0456 43 0.5445
0.0467 44 0.2793
0.0477 45 0.4398
0.0488 46 0.4882
0.0499 47 0.3142
0.0509 48 0.253
0.0520 49 0.1723
0.0530 50 0.4482
0.0541 51 0.3704
0.0552 52 0.3844
0.0562 53 0.3141
0.0573 54 0.2717
0.0583 55 0.0936
0.0594 56 0.0795
0.0605 57 0.0754
0.0615 58 0.0839
0.0626 59 0.0739
0.0636 60 0.0622
0.0647 61 0.0541
0.0658 62 0.4835
0.0668 63 0.4849
0.0679 64 0.5093
0.0689 65 0.4725
0.0700 66 0.4658
0.0711 67 0.4257
0.0721 68 0.4656
0.0732 69 0.5188
0.0743 70 0.465
0.0753 71 0.5166
0.0764 72 0.4152
0.0774 73 0.4874
0.0785 74 0.435
0.0796 75 0.4698
0.0806 76 0.4075
0.0817 77 0.2881
0.0827 78 0.3375
0.0838 79 0.3183
0.0849 80 0.3046
0.0859 81 0.5192
0.0870 82 0.4832
0.0880 83 0.4467
0.0891 84 0.3109
0.0902 85 0.4108
0.0912 86 0.3034
0.0923 87 0.2636
0.0933 88 0.2169
0.0944 89 0.2991
0.0955 90 0.2901
0.0965 91 0.335
0.0976 92 0.3621
0.0986 93 0.2661
0.0997 94 0.3448
0.1008 95 0.1964
0.1018 96 0.2323
0.1029 97 0.2856
0.1040 98 0.2986
0.1050 99 0.2628
0.1061 100 0.2865
0.1071 101 0.2288
0.1082 102 0.208
0.1093 103 0.2074
0.1103 104 0.1906
0.1114 105 0.1639
0.1124 106 0.1597
0.1135 107 0.1896
0.1146 108 0.1387
0.1156 109 0.1281
0.1167 110 0.2742
0.1177 111 0.1787
0.1188 112 0.1449
0.1199 113 0.1114
0.1209 114 0.1889
0.1220 115 0.1044
0.1230 116 0.2556
0.1241 117 0.2081
0.1252 118 0.2649
0.1262 119 0.3898
0.1273 120 0.6489
0.1283 121 0.6267
0.1294 122 0.6013
0.1305 123 0.5391
0.1315 124 0.5176
0.1326 125 0.4483
0.1337 126 0.4734
0.1347 127 0.6635
0.1358 128 0.3238
0.1368 129 0.1651
0.1379 130 0.4351
0.1390 131 0.2721
0.1400 132 0.2922
0.1411 133 0.3631
0.1421 134 0.4333
0.1432 135 0.2805
0.1443 136 0.0546
0.1453 137 0.0316
0.1464 138 0.0278
0.1474 139 0.0151
0.1485 140 0.0177
0.1496 141 0.0247
0.1506 142 0.0168
0.1517 143 0.0278
0.1527 144 0.0422
0.1538 145 0.0363
0.1549 146 0.0484
0.1559 147 0.0326
0.1570 148 0.009
0.1580 149 0.0216
0.1591 150 0.005
0.1602 151 0.0514
0.1612 152 0.0131
0.1623 153 0.0145
0.1634 154 0.0246
0.1644 155 0.0111
0.1655 156 0.0184
0.1665 157 0.0168
0.1676 158 0.0055
0.1687 159 0.0091
0.1697 160 0.0363
0.1708 161 0.0039
0.1718 162 0.0119
0.1729 163 0.0284
0.1740 164 0.0055
0.1750 165 0.0193
0.1761 166 0.0138
0.1771 167 0.0099
0.1782 168 0.026
0.1793 169 0.025
0.1803 170 0.0318
0.1814 171 0.0088
0.1824 172 0.0137
0.1835 173 0.0158
0.1846 174 0.0271
0.1856 175 0.0181
0.1867 176 0.026
0.1877 177 0.0207
0.1888 178 0.009
0.1899 179 0.0117
0.1909 180 0.0265
0.1920 181 0.0151
0.1931 182 0.0254
0.1941 183 0.0101
0.1952 184 0.0096
0.1962 185 0.0225
0.1973 186 0.0122
0.1984 187 0.0184
0.1994 188 0.0326
0.2005 189 0.0163
0.2015 190 0.0257
0.2026 191 0.0126
0.2037 192 0.0121
0.2047 193 0.0251
0.2058 194 0.0145
0.2068 195 0.0244
0.2079 196 0.0196
0.2090 197 0.0121
0.2100 198 0.0145
0.2111 199 0.0084
0.2121 200 0.013
0.2132 201 0.0123
0.2143 202 0.009
0.2153 203 0.0248
0.2164 204 0.0236
0.2174 205 0.0195
0.2185 206 0.0206
0.2196 207 0.0201
0.2206 208 0.0185
0.2217 209 0.0206
0.2228 210 0.0233
0.2238 211 0.0429
0.2249 212 0.0161
0.2259 213 0.0334
0.2270 214 0.0128
0.2281 215 0.0273
0.2291 216 0.0228
0.2302 217 0.0199
0.2312 218 0.0154
0.2323 219 0.0051
0.2334 220 0.018
0.2344 221 0.0194
0.2355 222 0.0095
0.2365 223 0.0058
0.2376 224 0.0285
0.2387 225 0.0107
0.2397 226 0.0196
0.2408 227 0.0311
0.2418 228 0.0198
0.2429 229 0.0126
0.2440 230 0.0168
0.2450 231 0.0069
0.2461 232 0.0112
0.2471 233 0.0133
0.2482 234 0.0234
0.2493 235 0.0174
0.2503 236 0.0133
0.2514 237 0.0068
0.2525 238 0.0213
0.2535 239 0.0197
0.2546 240 0.011
0.2556 241 0.0226
0.2567 242 0.0305
0.2578 243 0.0198
0.2588 244 0.0318
0.2599 245 0.024
0.2609 246 0.0349
0.2620 247 0.1405
0.2631 248 0.1075
0.2641 249 0.1303
0.2652 250 0.1108
0.2662 251 0.0913
0.2673 252 0.081
0.2684 253 0.0516
0.2694 254 0.082
0.2705 255 0.0558
0.2715 256 0.05
0.2726 257 0.0829
0.2737 258 0.1127
0.2747 259 0.0559
0.2758 260 0.1117
0.2768 261 0.06
0.2779 262 0.0525
0.2790 263 0.0488
0.2800 264 0.0403
0.2811 265 0.0978
0.2822 266 0.0404
0.2832 267 0.0481
0.2843 268 0.0357
0.2853 269 0.0327
0.2864 270 0.0615
0.2875 271 0.0662
0.2885 272 0.0546
0.2896 273 0.0523
0.2906 274 0.0436
0.2917 275 0.0509
0.2928 276 0.0279
0.2938 277 0.0405
0.2949 278 0.0608
0.2959 279 0.0223
0.2970 280 0.0103
0.2981 281 0.0432
0.2991 282 0.0491
0.3002 283 0.0237

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for seongil-dn/bge-m3-kor-retrieval-bs16-checkpoint-283

Base model

BAAI/bge-m3
Finetuned
(192)
this model