SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/ST-tramits-sitges-006-5ep")
# Run inference
sentences = [
    'El Decret 97/2002, de 5 de març, regula la concessió de la targeta d’aparcament per a persones amb disminució i altres mesures adreçades a facilitar el desplaçament de les persones amb mobilitat reduïda.',
    "Quin és el benefici de la targeta d'aparcament per a les persones amb disminució?",
    'Quin és el paper de la Junta de Govern Local?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1181
cosine_accuracy@3 0.2328
cosine_accuracy@5 0.3129
cosine_accuracy@10 0.4644
cosine_precision@1 0.1181
cosine_precision@3 0.0776
cosine_precision@5 0.0626
cosine_precision@10 0.0464
cosine_recall@1 0.1181
cosine_recall@3 0.2328
cosine_recall@5 0.3129
cosine_recall@10 0.4644
cosine_ndcg@10 0.2655
cosine_mrr@10 0.2053
cosine_map@100 0.226

Information Retrieval

Metric Value
cosine_accuracy@1 0.1158
cosine_accuracy@3 0.229
cosine_accuracy@5 0.3113
cosine_accuracy@10 0.4657
cosine_precision@1 0.1158
cosine_precision@3 0.0763
cosine_precision@5 0.0623
cosine_precision@10 0.0466
cosine_recall@1 0.1158
cosine_recall@3 0.229
cosine_recall@5 0.3113
cosine_recall@10 0.4657
cosine_ndcg@10 0.2641
cosine_mrr@10 0.2031
cosine_map@100 0.2236

Information Retrieval

Metric Value
cosine_accuracy@1 0.1191
cosine_accuracy@3 0.2328
cosine_accuracy@5 0.3176
cosine_accuracy@10 0.4658
cosine_precision@1 0.1191
cosine_precision@3 0.0776
cosine_precision@5 0.0635
cosine_precision@10 0.0466
cosine_recall@1 0.1191
cosine_recall@3 0.2328
cosine_recall@5 0.3176
cosine_recall@10 0.4658
cosine_ndcg@10 0.2667
cosine_mrr@10 0.2064
cosine_map@100 0.2267

Information Retrieval

Metric Value
cosine_accuracy@1 0.1153
cosine_accuracy@3 0.2266
cosine_accuracy@5 0.3086
cosine_accuracy@10 0.4567
cosine_precision@1 0.1153
cosine_precision@3 0.0755
cosine_precision@5 0.0617
cosine_precision@10 0.0457
cosine_recall@1 0.1153
cosine_recall@3 0.2266
cosine_recall@5 0.3086
cosine_recall@10 0.4567
cosine_ndcg@10 0.2604
cosine_mrr@10 0.201
cosine_map@100 0.2217

Information Retrieval

Metric Value
cosine_accuracy@1 0.1118
cosine_accuracy@3 0.2233
cosine_accuracy@5 0.3025
cosine_accuracy@10 0.4529
cosine_precision@1 0.1118
cosine_precision@3 0.0744
cosine_precision@5 0.0605
cosine_precision@10 0.0453
cosine_recall@1 0.1118
cosine_recall@3 0.2233
cosine_recall@5 0.3025
cosine_recall@10 0.4529
cosine_ndcg@10 0.2566
cosine_mrr@10 0.1972
cosine_map@100 0.2178

Information Retrieval

Metric Value
cosine_accuracy@1 0.1069
cosine_accuracy@3 0.2125
cosine_accuracy@5 0.2885
cosine_accuracy@10 0.4297
cosine_precision@1 0.1069
cosine_precision@3 0.0708
cosine_precision@5 0.0577
cosine_precision@10 0.043
cosine_recall@1 0.1069
cosine_recall@3 0.2125
cosine_recall@5 0.2885
cosine_recall@10 0.4297
cosine_ndcg@10 0.2438
cosine_mrr@10 0.1876
cosine_map@100 0.2081

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 2,844 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 3 tokens
    • mean: 49.45 tokens
    • max: 148 tokens
    • min: 10 tokens
    • mean: 20.94 tokens
    • max: 45 tokens
  • Samples:
    positive anchor
    L'Ajuntament de Sitges atorga subvencions per a projectes i activitats d'interès públic o social que tinguin per finalitat les activitats esportives federades, escolars o populars desenvolupades per les entitats esportives i esportistes del municipi de Sitges. Quin és el benefici de les subvencions per a les entitats esportives?
    Per a poder ser beneficiari d'una subvenció per a un projecte o activitat cultural, les entitats o associacions culturals de Sitges han de tenir una seu social a la ciutat de Sitges i estar inscrites en el Registre d'Entitats de la Generalitat de Catalunya. Quin és el requisit per a poder ser beneficiari d'una subvenció per a un projecte o activitat cultural?
    La cessió entre tercers, només es contempla en el cas de sepultures de construcció particular que hagin estat donades d'alta amb una anterioritat de 10 anys a la data de sol·licitud de la cessió. Quin és el paper de la persona que, legalment hi tingui dret, en la cessió entre tercers?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8989 10 3.2114 - - - - - -
0.9888 11 - 0.2144 0.2008 0.2070 0.2126 0.1842 0.2126
1.7978 20 1.5622 - - - - - -
1.9775 22 - 0.2179 0.2101 0.2169 0.2180 0.2012 0.2193
2.6966 30 0.7882 - - - - - -
2.9663 33 - 0.2239 0.2162 0.2220 0.2238 0.2070 0.2222
3.5955 40 0.4956 - - - - - -
3.9551 44 - 0.2270 0.2177 0.2231 0.2278 0.2084 0.2255
4.4944 50 0.392 - - - - - -
4.9438 55 - 0.226 0.2178 0.2217 0.2267 0.2081 0.2236
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-sitges-006-5ep

Base model

BAAI/bge-m3
Finetuned
(185)
this model

Evaluation results