SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/multilingual-e5-small-cogcache-contrastive")
# Run inference
sentences = [
    'What is the capital of Italy?',
    "Italy's capital city",
    'I need help with my homework',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 1.0
cosine_accuracy_threshold 0.8237
cosine_f1 1.0
cosine_f1_threshold 0.8237
cosine_precision 1.0
cosine_recall 1.0
cosine_ap 1.0
dot_accuracy 1.0
dot_accuracy_threshold 0.8237
dot_f1 1.0
dot_f1_threshold 0.8237
dot_precision 1.0
dot_recall 1.0
dot_ap 1.0
manhattan_accuracy 0.973
manhattan_accuracy_threshold 7.9234
manhattan_f1 0.9796
manhattan_f1_threshold 9.903
manhattan_precision 0.96
manhattan_recall 1.0
manhattan_ap 0.9983
euclidean_accuracy 1.0
euclidean_accuracy_threshold 0.5938
euclidean_f1 1.0
euclidean_f1_threshold 0.5938
euclidean_precision 1.0
euclidean_recall 1.0
euclidean_ap 1.0
max_accuracy 1.0
max_accuracy_threshold 7.9234
max_f1 1.0
max_f1_threshold 9.903
max_precision 1.0
max_recall 1.0
max_ap 1.0

Binary Classification

Metric Value
cosine_accuracy 1.0
cosine_accuracy_threshold 0.8053
cosine_f1 1.0
cosine_f1_threshold 0.8053
cosine_precision 1.0
cosine_recall 1.0
cosine_ap 1.0
dot_accuracy 1.0
dot_accuracy_threshold 0.8053
dot_f1 1.0
dot_f1_threshold 0.8053
dot_precision 1.0
dot_recall 1.0
dot_ap 1.0
manhattan_accuracy 1.0
manhattan_accuracy_threshold 9.7795
manhattan_f1 1.0
manhattan_f1_threshold 9.7795
manhattan_precision 1.0
manhattan_recall 1.0
manhattan_ap 1.0
euclidean_accuracy 1.0
euclidean_accuracy_threshold 0.6236
euclidean_f1 1.0
euclidean_f1_threshold 0.6236
euclidean_precision 1.0
euclidean_recall 1.0
euclidean_ap 1.0
max_accuracy 1.0
max_accuracy_threshold 9.7795
max_f1 1.0
max_f1_threshold 9.7795
max_precision 1.0
max_recall 1.0
max_ap 1.0

Training Details

Training Dataset

Unnamed Dataset

  • Size: 333 training samples
  • Columns: sentence1, label, and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 label sentence2
    type string int string
    details
    • min: 6 tokens
    • mean: 10.25 tokens
    • max: 20 tokens
    • 0: ~51.65%
    • 1: ~48.35%
    • min: 4 tokens
    • mean: 9.42 tokens
    • max: 22 tokens
  • Samples:
    sentence1 label sentence2
    How to improve my credit score? 1 Improving my credit score tips
    How does photosynthesis work? 0 What are the steps of photosynthesis?
    What is the population of Germany? 0 How many people live in Berlin?
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 37 evaluation samples
  • Columns: sentence1, label, and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 label sentence2
    type string int string
    details
    • min: 7 tokens
    • mean: 10.0 tokens
    • max: 13 tokens
    • 0: ~35.14%
    • 1: ~64.86%
    • min: 6 tokens
    • mean: 8.68 tokens
    • max: 12 tokens
  • Samples:
    sentence1 label sentence2
    What is the price of Bitcoin? 1 Bitcoin's current value
    Who discovered gravity? 1 Who found out about gravity?
    What is the most spoken language in the world? 1 Language spoken by the most people
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 2
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • num_train_epochs: 5
  • lr_scheduler_type: reduce_lr_on_plateau
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: reduce_lr_on_plateau
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.8544 -
0.9524 10 0.0318 0.0106 0.9935 -
1.9048 20 0.0126 - - -
2.0 21 - 0.0043 1.0 -
2.8571 30 0.008 - - -
2.9524 31 - 0.004 1.0 -
3.8095 40 0.0056 - - -
4.0 42 - 0.0040 1.0 -
4.7619 50 0.0039 0.0045 1.0 1.0
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)}, 
    title={Dimensionality Reduction by Learning an Invariant Mapping}, 
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
16
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/multilingual-e5-small-cogcache-contrastive

Finetuned
(59)
this model

Evaluation results