SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/ST-tramits-sitges-003-10ep")
# Run inference
sentences = [
    "Els comerços locals obtenen un benefici principal de la implementació del projecte d'implantació i ús de la targeta de fidelització del comerç local de Sitges, que és la possibilitat d'augmentar les vendes i la fidelització dels clients.",
    "Quin és el benefici que els comerços locals obtenen de la implementació del projecte d'implantació i ús de la targeta de fidelització?",
    'Quin és el propòsit de la deixalleria municipal per a l’ambient?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1331
cosine_accuracy@3 0.2624
cosine_accuracy@5 0.3536
cosine_accuracy@10 0.5243
cosine_precision@1 0.1331
cosine_precision@3 0.0875
cosine_precision@5 0.0707
cosine_precision@10 0.0524
cosine_recall@1 0.1331
cosine_recall@3 0.2624
cosine_recall@5 0.3536
cosine_recall@10 0.5243
cosine_ndcg@10 0.2986
cosine_mrr@10 0.2301
cosine_map@100 0.2513

Information Retrieval

Metric Value
cosine_accuracy@1 0.1322
cosine_accuracy@3 0.263
cosine_accuracy@5 0.3541
cosine_accuracy@10 0.5286
cosine_precision@1 0.1322
cosine_precision@3 0.0877
cosine_precision@5 0.0708
cosine_precision@10 0.0529
cosine_recall@1 0.1322
cosine_recall@3 0.263
cosine_recall@5 0.3541
cosine_recall@10 0.5286
cosine_ndcg@10 0.3011
cosine_mrr@10 0.2322
cosine_map@100 0.253

Information Retrieval

Metric Value
cosine_accuracy@1 0.1342
cosine_accuracy@3 0.2655
cosine_accuracy@5 0.3589
cosine_accuracy@10 0.5257
cosine_precision@1 0.1342
cosine_precision@3 0.0885
cosine_precision@5 0.0718
cosine_precision@10 0.0526
cosine_recall@1 0.1342
cosine_recall@3 0.2655
cosine_recall@5 0.3589
cosine_recall@10 0.5257
cosine_ndcg@10 0.3011
cosine_mrr@10 0.2329
cosine_map@100 0.2538

Information Retrieval

Metric Value
cosine_accuracy@1 0.1266
cosine_accuracy@3 0.2633
cosine_accuracy@5 0.3564
cosine_accuracy@10 0.5229
cosine_precision@1 0.1266
cosine_precision@3 0.0878
cosine_precision@5 0.0713
cosine_precision@10 0.0523
cosine_recall@1 0.1266
cosine_recall@3 0.2633
cosine_recall@5 0.3564
cosine_recall@10 0.5229
cosine_ndcg@10 0.2972
cosine_mrr@10 0.2285
cosine_map@100 0.2496

Information Retrieval

Metric Value
cosine_accuracy@1 0.1274
cosine_accuracy@3 0.2684
cosine_accuracy@5 0.3553
cosine_accuracy@10 0.521
cosine_precision@1 0.1274
cosine_precision@3 0.0895
cosine_precision@5 0.0711
cosine_precision@10 0.0521
cosine_recall@1 0.1274
cosine_recall@3 0.2684
cosine_recall@5 0.3553
cosine_recall@10 0.521
cosine_ndcg@10 0.2973
cosine_mrr@10 0.2293
cosine_map@100 0.2507

Information Retrieval

Metric Value
cosine_accuracy@1 0.1224
cosine_accuracy@3 0.2546
cosine_accuracy@5 0.344
cosine_accuracy@10 0.5165
cosine_precision@1 0.1224
cosine_precision@3 0.0849
cosine_precision@5 0.0688
cosine_precision@10 0.0516
cosine_recall@1 0.1224
cosine_recall@3 0.2546
cosine_recall@5 0.344
cosine_recall@10 0.5165
cosine_ndcg@10 0.2909
cosine_mrr@10 0.2225
cosine_map@100 0.2429

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,399 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 9 tokens
    • mean: 49.44 tokens
    • max: 178 tokens
    • min: 9 tokens
    • mean: 21.17 tokens
    • max: 48 tokens
  • Samples:
    positive anchor
    L'Ajuntament de Sitges atorga subvencions per a projectes i activitats d'interès públic o social que tinguin per finalitat les activitats esportives federades, escolars o populars desenvolupades per les entitats esportives i esportistes del municipi de Sitges. Quin és el benefici de les subvencions per a les entitats esportives?
    L'Ajuntament de Sitges atorga subvencions per a projectes i activitats d'interès públic o social que tinguin per finalitat les activitats esportives federades, escolars o populars desenvolupades per les entitats esportives i esportistes del municipi de Sitges al llarg de l'exercici per la qual es sol·licita la subvenció, i reuneixin les condicions assenyalades a les bases. Quin és el període d'execució dels projectes i activitats esportives?
    Certificat on s'indica el nombre d'habitatges que configuren el padró de l'Impost sobre Béns Immobles del municipi o bé d'una part d'aquest. Quin és el contingut del certificat del nombre d'habitatges?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.4 10 3.5464 - - - - - -
0.8 20 2.3861 - - - - - -
1.0 25 - 0.2327 0.2144 0.2252 0.2286 0.1938 0.2329
1.1975 30 1.8712 - - - - - -
1.5975 40 1.3322 - - - - - -
1.9975 50 0.9412 0.2410 0.2310 0.2383 0.2415 0.2236 0.2436
2.395 60 0.806 - - - - - -
2.795 70 0.5024 - - - - - -
2.995 75 - 0.2451 0.2384 0.2455 0.2487 0.2323 0.2423
3.1925 80 0.4259 - - - - - -
3.5925 90 0.3556 - - - - - -
3.9925 100 0.2555 0.2477 0.2443 0.2417 0.2485 0.2369 0.2470
4.39 110 0.2611 - - - - - -
4.79 120 0.1939 - - - - - -
4.99 125 - 0.2490 0.2425 0.2479 0.2485 0.2386 0.2495
5.1875 130 0.2021 - - - - - -
5.5875 140 0.1537 - - - - - -
5.9875 150 0.1277 0.2535 0.2491 0.2491 0.2534 0.2403 0.2541
6.385 160 0.1213 - - - - - -
6.785 170 0.1035 - - - - - -
6.985 175 - 0.2513 0.2493 0.2435 0.2515 0.2380 0.2528
7.1825 180 0.0965 - - - - - -
7.5825 190 0.0861 - - - - - -
7.9825 200 0.0794 0.2529 0.2536 0.2526 0.2545 0.2438 0.2570
8.38 210 0.0734 - - - - - -
8.78 220 0.066 - - - - - -
8.98 225 - 0.2538 0.2523 0.2519 0.2542 0.2457 0.2572
9.1775 230 0.0731 - - - - - -
9.5775 240 0.0726 - - - - - -
9.9775 250 0.0632 0.2513 0.2507 0.2496 0.2538 0.2429 0.2530
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-sitges-003-10ep

Base model

BAAI/bge-m3
Finetuned
(185)
this model

Evaluation results