SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/sqv-v5-10ep")
# Run inference
sentences = [
    'Aquest tipus de transmissió entre cedent i cessionari només podrà ser de caràcter gratuït i no condicionada.',
    'Quin és el caràcter de la transmissió de drets funeraris entre cedent i cessionari?',
    'Quin és el propòsit de la Deixalleria municipal?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0478
cosine_accuracy@3 0.2087
cosine_accuracy@5 0.3087
cosine_accuracy@10 0.5565
cosine_precision@1 0.0478
cosine_precision@3 0.0696
cosine_precision@5 0.0617
cosine_precision@10 0.0557
cosine_recall@1 0.0478
cosine_recall@3 0.2087
cosine_recall@5 0.3087
cosine_recall@10 0.5565
cosine_ndcg@10 0.2589
cosine_mrr@10 0.1696
cosine_map@100 0.1876

Information Retrieval

Metric Value
cosine_accuracy@1 0.0609
cosine_accuracy@3 0.213
cosine_accuracy@5 0.3043
cosine_accuracy@10 0.5565
cosine_precision@1 0.0609
cosine_precision@3 0.071
cosine_precision@5 0.0609
cosine_precision@10 0.0557
cosine_recall@1 0.0609
cosine_recall@3 0.213
cosine_recall@5 0.3043
cosine_recall@10 0.5565
cosine_ndcg@10 0.2638
cosine_mrr@10 0.176
cosine_map@100 0.1934

Information Retrieval

Metric Value
cosine_accuracy@1 0.0783
cosine_accuracy@3 0.2174
cosine_accuracy@5 0.3435
cosine_accuracy@10 0.5696
cosine_precision@1 0.0783
cosine_precision@3 0.0725
cosine_precision@5 0.0687
cosine_precision@10 0.057
cosine_recall@1 0.0783
cosine_recall@3 0.2174
cosine_recall@5 0.3435
cosine_recall@10 0.5696
cosine_ndcg@10 0.2812
cosine_mrr@10 0.1947
cosine_map@100 0.2122

Information Retrieval

Metric Value
cosine_accuracy@1 0.0522
cosine_accuracy@3 0.2087
cosine_accuracy@5 0.3174
cosine_accuracy@10 0.513
cosine_precision@1 0.0522
cosine_precision@3 0.0696
cosine_precision@5 0.0635
cosine_precision@10 0.0513
cosine_recall@1 0.0522
cosine_recall@3 0.2087
cosine_recall@5 0.3174
cosine_recall@10 0.513
cosine_ndcg@10 0.2483
cosine_mrr@10 0.1679
cosine_map@100 0.1893

Information Retrieval

Metric Value
cosine_accuracy@1 0.0565
cosine_accuracy@3 0.2261
cosine_accuracy@5 0.3261
cosine_accuracy@10 0.5435
cosine_precision@1 0.0565
cosine_precision@3 0.0754
cosine_precision@5 0.0652
cosine_precision@10 0.0543
cosine_recall@1 0.0565
cosine_recall@3 0.2261
cosine_recall@5 0.3261
cosine_recall@10 0.5435
cosine_ndcg@10 0.2661
cosine_mrr@10 0.182
cosine_map@100 0.2004

Information Retrieval

Metric Value
cosine_accuracy@1 0.0565
cosine_accuracy@3 0.2174
cosine_accuracy@5 0.3174
cosine_accuracy@10 0.5435
cosine_precision@1 0.0565
cosine_precision@3 0.0725
cosine_precision@5 0.0635
cosine_precision@10 0.0543
cosine_recall@1 0.0565
cosine_recall@3 0.2174
cosine_recall@5 0.3174
cosine_recall@10 0.5435
cosine_ndcg@10 0.2641
cosine_mrr@10 0.1797
cosine_map@100 0.1971

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,520 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 5 tokens
    • mean: 43.78 tokens
    • max: 117 tokens
    • min: 9 tokens
    • mean: 20.5 tokens
    • max: 51 tokens
  • Samples:
    positive anchor
    L’Ajuntament vol crear un banc de recursos on recollir tots els oferiments de la població i que servirà per atendre les necessitats de les famílies refugiades acollides al poble. Quin és el paper de l’Ajuntament en la integració de les persones refugiades acollides?
    Aquest tipus d'actuació requereix la intervenció d'una persona tècnica competent que subscrigui el projecte o la documentació tècnica corresponent i que assumeixi la direcció facultativa de l'execució de les obres. Quin és el requisit per a la intervenció d'una persona tècnica competent en les obres d'intervenció parcial interior en edificis amb elements catalogats?
    Aquest títol, adreçat a persones empadronades a Sant Quirze del Vallès, es concedirà segons el nivell d’ingressos, la condició d’edat o de discapacitat, en base als criteris específics que recull l’ordenança reguladora del sistema de tarifació social del transport públic municipal en autobús a Sant Quirze del Vallès. Quin és el benefici de la TBUS GRATUÏTA per a les persones majors?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.4638 10 4.0375 - - - - - -
0.9275 20 3.2095 - - - - - -
0.9739 21 - 0.1772 0.1818 0.1967 0.1911 0.1417 0.1750
1.3913 30 2.1843 - - - - - -
1.8551 40 1.6095 - - - - - -
1.9942 43 - 0.1889 0.1676 0.1961 0.1969 0.1834 0.1899
2.3188 50 1.2099 - - - - - -
2.7826 60 0.909 - - - - - -
2.9681 64 - 0.1998 0.1977 0.2164 0.2030 0.1972 0.2156
3.2464 70 0.7534 - - - - - -
3.7101 80 0.6339 - - - - - -
3.9884 86 - 0.2049 0.2024 0.1989 0.1935 0.2046 0.1949
4.1739 90 0.5423 - - - - - -
4.6377 100 0.5135 - - - - - -
4.9623 107 - 0.1967 0.2199 0.1892 0.2113 0.1957 0.2037
5.1014 110 0.4563 - - - - - -
5.5652 120 0.3837 - - - - - -
5.9826 129 - 0.2026 0.1898 0.1903 0.2035 0.2034 0.2187
6.0290 130 0.3991 - - - - - -
6.4928 140 0.3996 - - - - - -
6.9565 150 0.3225 0.2053 0.1866 0.2046 0.2083 0.1822 0.2086
7.4203 160 0.3407 - - - - - -
7.8841 170 0.2982 - - - - - -
7.9768 172 - 0.2092 0.2197 0.2005 0.2178 0.2063 0.2042
8.3478 180 0.3169 - - - - - -
8.8116 190 0.2799 - - - - - -
8.9971 194 - 0.2053 0.2215 0.1929 0.2191 0.2106 0.2170
9.2754 200 0.312 - - - - - -
9.7391 210 0.2684 0.1876 0.2004 0.1893 0.2122 0.1971 0.1934
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-SQV-005-10ep

Base model

BAAI/bge-m3
Finetuned
(185)
this model

Evaluation results