SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/sqv-v5-5ep")
# Run inference
sentences = [
    'Permet tramitar la baixa de les activitats esportives municipals.',
    'Quin és el procés per a donar de baixa una activitat esportiva?',
    'Quin és el benefici fiscal que es pot obtenir?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1
cosine_accuracy@3 0.2261
cosine_accuracy@5 0.3043
cosine_accuracy@10 0.4957
cosine_precision@1 0.1
cosine_precision@3 0.0754
cosine_precision@5 0.0609
cosine_precision@10 0.0496
cosine_recall@1 0.1
cosine_recall@3 0.2261
cosine_recall@5 0.3043
cosine_recall@10 0.4957
cosine_ndcg@10 0.2645
cosine_mrr@10 0.1949
cosine_map@100 0.2142

Information Retrieval

Metric Value
cosine_accuracy@1 0.1
cosine_accuracy@3 0.213
cosine_accuracy@5 0.3
cosine_accuracy@10 0.4913
cosine_precision@1 0.1
cosine_precision@3 0.071
cosine_precision@5 0.06
cosine_precision@10 0.0491
cosine_recall@1 0.1
cosine_recall@3 0.213
cosine_recall@5 0.3
cosine_recall@10 0.4913
cosine_ndcg@10 0.2612
cosine_mrr@10 0.1922
cosine_map@100 0.2117

Information Retrieval

Metric Value
cosine_accuracy@1 0.0957
cosine_accuracy@3 0.2522
cosine_accuracy@5 0.3217
cosine_accuracy@10 0.5043
cosine_precision@1 0.0957
cosine_precision@3 0.0841
cosine_precision@5 0.0643
cosine_precision@10 0.0504
cosine_recall@1 0.0957
cosine_recall@3 0.2522
cosine_recall@5 0.3217
cosine_recall@10 0.5043
cosine_ndcg@10 0.2737
cosine_mrr@10 0.2033
cosine_map@100 0.2225

Information Retrieval

Metric Value
cosine_accuracy@1 0.0913
cosine_accuracy@3 0.2435
cosine_accuracy@5 0.3261
cosine_accuracy@10 0.4783
cosine_precision@1 0.0913
cosine_precision@3 0.0812
cosine_precision@5 0.0652
cosine_precision@10 0.0478
cosine_recall@1 0.0913
cosine_recall@3 0.2435
cosine_recall@5 0.3261
cosine_recall@10 0.4783
cosine_ndcg@10 0.2584
cosine_mrr@10 0.1911
cosine_map@100 0.2126

Information Retrieval

Metric Value
cosine_accuracy@1 0.0957
cosine_accuracy@3 0.2217
cosine_accuracy@5 0.3261
cosine_accuracy@10 0.513
cosine_precision@1 0.0957
cosine_precision@3 0.0739
cosine_precision@5 0.0652
cosine_precision@10 0.0513
cosine_recall@1 0.0957
cosine_recall@3 0.2217
cosine_recall@5 0.3261
cosine_recall@10 0.513
cosine_ndcg@10 0.2704
cosine_mrr@10 0.1969
cosine_map@100 0.2158

Information Retrieval

Metric Value
cosine_accuracy@1 0.1043
cosine_accuracy@3 0.2348
cosine_accuracy@5 0.3217
cosine_accuracy@10 0.4913
cosine_precision@1 0.1043
cosine_precision@3 0.0783
cosine_precision@5 0.0643
cosine_precision@10 0.0491
cosine_recall@1 0.1043
cosine_recall@3 0.2348
cosine_recall@5 0.3217
cosine_recall@10 0.4913
cosine_ndcg@10 0.2687
cosine_mrr@10 0.201
cosine_map@100 0.2206

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,520 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 5 tokens
    • mean: 43.7 tokens
    • max: 117 tokens
    • min: 9 tokens
    • mean: 20.51 tokens
    • max: 51 tokens
  • Samples:
    positive anchor
    L’Ajuntament vol crear un banc de recursos on recollir tots els oferiments de la població i que servirà per atendre les necessitats de les famílies refugiades acollides al poble. Quin és el paper de l’Ajuntament en la integració de les persones refugiades acollides?
    Aquest tipus d'actuació requereix la intervenció d'una persona tècnica competent que subscrigui el projecte o la documentació tècnica corresponent i que assumeixi la direcció facultativa de l'execució de les obres. Quin és el requisit per a la intervenció d'una persona tècnica competent en les obres d'intervenció parcial interior en edificis amb elements catalogats?
    Aquest títol, adreçat a persones empadronades a Sant Quirze del Vallès, es concedirà segons el nivell d’ingressos, la condició d’edat o de discapacitat, en base als criteris específics que recull l’ordenança reguladora del sistema de tarifació social del transport públic municipal en autobús a Sant Quirze del Vallès. Quin és el benefici de la TBUS GRATUÏTA per a les persones majors?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.4638 10 4.122 - - - - - -
0.9275 20 2.7131 - - - - - -
0.9739 21 - 0.2085 0.1973 0.1884 0.2087 0.1886 0.2177
1.3913 30 1.6964 - - - - - -
1.8551 40 1.2311 - - - - - -
1.9942 43 - 0.2148 0.2135 0.2170 0.2351 0.2091 0.2386
2.3188 50 0.9216 - - - - - -
2.7826 60 0.737 - - - - - -
2.9681 64 - 0.2145 0.2058 0.2072 0.2277 0.2127 0.2085
3.2464 70 0.6678 - - - - - -
3.7101 80 0.555 - - - - - -
3.9884 86 - 0.2028 0.2154 0.2117 0.2331 0.2113 0.2028
4.1739 90 0.5542 - - - - - -
4.6377 100 0.5058 - - - - - -
4.8696 105 - 0.2142 0.2158 0.2126 0.2225 0.2206 0.2117
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-SQV-005-5ep

Base model

BAAI/bge-m3
Finetuned
(185)
this model

Evaluation results