SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/ST-tramits-sitges-003-5ep")
# Run inference
sentences = [
    'A la nostra vila hi ha veïns i veïnes que els agradaria tornar a fer de pagès o provar-ho per primera vegada.',
    "Quin és l'objectiu principal de l'activitat del Viver dels Avis de Sitges?",
    'Quin és el paper de les persones en relació amb les indemnitzacions?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1105
cosine_accuracy@3 0.227
cosine_accuracy@5 0.3055
cosine_accuracy@10 0.4532
cosine_precision@1 0.1105
cosine_precision@3 0.0757
cosine_precision@5 0.0611
cosine_precision@10 0.0453
cosine_recall@1 0.1105
cosine_recall@3 0.227
cosine_recall@5 0.3055
cosine_recall@10 0.4532
cosine_ndcg@10 0.2562
cosine_mrr@10 0.1965
cosine_map@100 0.2186

Information Retrieval

Metric Value
cosine_accuracy@1 0.1156
cosine_accuracy@3 0.2321
cosine_accuracy@5 0.3114
cosine_accuracy@10 0.4456
cosine_precision@1 0.1156
cosine_precision@3 0.0774
cosine_precision@5 0.0623
cosine_precision@10 0.0446
cosine_recall@1 0.1156
cosine_recall@3 0.2321
cosine_recall@5 0.3114
cosine_recall@10 0.4456
cosine_ndcg@10 0.258
cosine_mrr@10 0.2009
cosine_map@100 0.2234

Information Retrieval

Metric Value
cosine_accuracy@1 0.1038
cosine_accuracy@3 0.2211
cosine_accuracy@5 0.297
cosine_accuracy@10 0.4397
cosine_precision@1 0.1038
cosine_precision@3 0.0737
cosine_precision@5 0.0594
cosine_precision@10 0.044
cosine_recall@1 0.1038
cosine_recall@3 0.2211
cosine_recall@5 0.297
cosine_recall@10 0.4397
cosine_ndcg@10 0.2474
cosine_mrr@10 0.1889
cosine_map@100 0.2118

Information Retrieval

Metric Value
cosine_accuracy@1 0.1004
cosine_accuracy@3 0.2152
cosine_accuracy@5 0.2979
cosine_accuracy@10 0.4439
cosine_precision@1 0.1004
cosine_precision@3 0.0717
cosine_precision@5 0.0596
cosine_precision@10 0.0444
cosine_recall@1 0.1004
cosine_recall@3 0.2152
cosine_recall@5 0.2979
cosine_recall@10 0.4439
cosine_ndcg@10 0.248
cosine_mrr@10 0.1883
cosine_map@100 0.2113

Information Retrieval

Metric Value
cosine_accuracy@1 0.1089
cosine_accuracy@3 0.2262
cosine_accuracy@5 0.303
cosine_accuracy@10 0.4414
cosine_precision@1 0.1089
cosine_precision@3 0.0754
cosine_precision@5 0.0606
cosine_precision@10 0.0441
cosine_recall@1 0.1089
cosine_recall@3 0.2262
cosine_recall@5 0.303
cosine_recall@10 0.4414
cosine_ndcg@10 0.2537
cosine_mrr@10 0.1964
cosine_map@100 0.2188

Information Retrieval

Metric Value
cosine_accuracy@1 0.0937
cosine_accuracy@3 0.2
cosine_accuracy@5 0.2743
cosine_accuracy@10 0.4177
cosine_precision@1 0.0937
cosine_precision@3 0.0667
cosine_precision@5 0.0549
cosine_precision@10 0.0418
cosine_recall@1 0.0937
cosine_recall@3 0.2
cosine_recall@5 0.2743
cosine_recall@10 0.4177
cosine_ndcg@10 0.2305
cosine_mrr@10 0.1738
cosine_map@100 0.1978

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 8,769 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 5 tokens
    • mean: 49.22 tokens
    • max: 178 tokens
    • min: 10 tokens
    • mean: 20.94 tokens
    • max: 48 tokens
  • Samples:
    positive anchor
    L'Ajuntament de Sitges atorga subvencions per a projectes i activitats d'interès públic o social que tinguin per finalitat les activitats esportives federades, escolars o populars desenvolupades per les entitats esportives i esportistes del municipi de Sitges. Quin és el benefici de les subvencions per a les entitats esportives?
    L'Ajuntament de Sitges atorga subvencions per a projectes i activitats d'interès públic o social que tinguin per finalitat les activitats esportives federades, escolars o populars desenvolupades per les entitats esportives i esportistes del municipi de Sitges al llarg de l'exercici per la qual es sol·licita la subvenció, i reuneixin les condicions assenyalades a les bases. Quin és el període d'execució dels projectes i activitats esportives?
    Certificat on s'indica el nombre d'habitatges que configuren el padró de l'Impost sobre Béns Immobles del municipi o bé d'una part d'aquest. Quin és el contingut del certificat del nombre d'habitatges?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.2914 10 3.6318 - - - - - -
0.5829 20 2.329 - - - - - -
0.8743 30 1.5614 - - - - - -
0.9909 34 - 0.2055 0.1998 0.2020 0.2001 0.1903 0.2019
1.1658 40 1.2383 - - - - - -
1.4572 50 0.9323 - - - - - -
1.7486 60 0.6616 - - - - - -
1.9818 68 - 0.2244 0.2063 0.2223 0.2166 0.2011 0.2235
2.0401 70 0.5545 - - - - - -
2.3315 80 0.5043 - - - - - -
2.6230 90 0.3542 - - - - - -
2.9144 100 0.3095 - - - - - -
2.9727 102 - 0.2224 0.2046 0.2170 0.2100 0.1986 0.2144
3.2058 110 0.2863 - - - - - -
3.4973 120 0.2329 - - - - - -
3.7887 130 0.2353 - - - - - -
3.9927 137 - 0.2197 0.2112 0.2098 0.2154 0.1949 0.2178
4.0801 140 0.1759 - - - - - -
4.3716 150 0.2308 - - - - - -
4.6630 160 0.1656 - - - - - -
4.9545 170 0.1812 0.2186 0.2188 0.2113 0.2118 0.1978 0.2234
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-sitges-003-5ep

Base model

BAAI/bge-m3
Finetuned
(192)
this model

Evaluation results