SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base on the rozetka_positive_pairs dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Dot Product
  • Training Dataset:
    • rozetka_positive_pairs

Model Sources

Full Model Architecture

RZTKSentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rztk/multilingual-e5-base-matryoshka2d-mnr-3")
# Run inference
sentences = [
    'query: мебель для кухни',
    'passage: Кухня Эко модуль Вытяжка 600 Эверест Ясень Шимо Светлый 60х30х28 см',
    'passage: Ключниці кишенькові Karya Гарантія 14 днів Для кого Для жінок Колір Червоний Матеріал Шкіра Країна реєстрації бренда Туреччина Країна-виробник товару Туреччина',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

rozetka_positive_pairs

  • Dataset: rozetka_positive_pairs
  • Size: 58,620,066 training samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 6 tokens
    • mean: 11.27 tokens
    • max: 30 tokens
    • min: 11 tokens
    • mean: 59.47 tokens
    • max: 512 tokens
  • Samples:
    query text
    query: xsiomi 9c скло passage: Защитные стекла Назначение Для мобильных телефонов Цвет Черный Теги Теги Наличие рамки C рамкой Форм-фактор Плоское Клеевой слой По всей поверхности
    query: xsiomi 9c скло passage: Захисне скло Призначення Для мобільних телефонів Колір Чорний Теги Теги Наявність рамки З рамкою Форм-фактор Плоске Клейовий шар По всій поверхні
    query: xsiomi 9c скло passage: Захисне скло Glass Full Glue для Xiaomi Redmi 9A/9C/10A (Чорний)
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Evaluation Dataset

rozetka_positive_pairs

  • Dataset: rozetka_positive_pairs
  • Size: 1,903,728 evaluation samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 6 tokens
    • mean: 8.36 tokens
    • max: 16 tokens
    • min: 8 tokens
    • mean: 45.68 tokens
    • max: 365 tokens
  • Samples:
    query text
    query: создаем нейронную сеть passage: Створюємо нейронну мережу
    query: создаем нейронную сеть passage: Создаем нейронную сеть (1666498)
    query: создаем нейронную сеть passage: Научная и техническая литература Переплет Мягкий
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 88
  • per_device_eval_batch_size: 88
  • learning_rate: 2e-05
  • num_train_epochs: 1.0
  • warmup_ratio: 0.1
  • bf16: True
  • bf16_full_eval: True
  • tf32: True
  • dataloader_num_workers: 8
  • load_best_model_at_end: True
  • optim: adafactor
  • push_to_hub: True
  • hub_model_id: rztk/multilingual-e5-base-matryoshka2d-mnr-3
  • hub_private_repo: True
  • prompts: {'query': 'query: ', 'text': 'passage: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 88
  • per_device_eval_batch_size: 88
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: True
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 8
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adafactor
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: rztk/multilingual-e5-base-matryoshka2d-mnr-3
  • hub_strategy: every_save
  • hub_private_repo: True
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'query': 'query: ', 'text': 'passage: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • ddp_static_graph: False
  • ddp_comm_hook: bf16
  • gradient_as_bucket_view: False
  • num_proc: 30

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0050 833 4.8404 -
0.0100 1666 4.6439 -
0.0150 2499 4.2238 -
0.0200 3332 3.5445 -
0.0250 4165 2.7514 -
0.0300 4998 2.4037 -
0.0350 5831 2.1916 -
0.0400 6664 2.0938 -
0.0450 7497 1.9268 -
0.0500 8330 1.8671 -
0.0550 9163 1.7069 -
0.0600 9996 1.6419 -
0.0650 10829 1.55 -
0.0700 11662 1.5483 -
0.0750 12495 1.5419 -
0.0800 13328 1.3582 -
0.0850 14161 1.3537 -
0.0900 14994 1.3067 -
0.0950 15827 1.2128 -
0.1000 16654 - 1.0107
0.1000 16660 1.2248 -
0.1050 17493 1.1565 -
0.1100 18326 1.1351 -
0.1150 19159 1.0808 -
0.1200 19992 1.0561 -
0.1250 20825 1.078 -
0.1301 21658 1.1413 -
0.1351 22491 1.0446 -
0.1401 23324 0.9986 -
0.1451 24157 0.9668 -
0.1501 24990 0.9753 -
0.1551 25823 1.0031 -
0.1601 26656 0.9688 -
0.1651 27489 0.9262 -
0.1701 28322 0.9702 -
0.1751 29155 0.9082 -
0.1801 29988 0.9264 -
0.1851 30821 0.8526 -
0.1901 31654 0.9667 -
0.1951 32487 0.9421 -
0.2000 33308 - 0.6416
0.2001 33320 0.9216 -
0.2051 34153 0.95 -
0.2101 34986 0.8895 -
0.2151 35819 0.8349 -
0.2201 36652 0.8628 -
0.2251 37485 0.8729 -
0.2301 38318 0.9285 -
0.2351 39151 0.8718 -
0.2401 39984 0.8792 -
0.2451 40817 0.8852 -
0.2501 41650 0.877 -
0.2551 42483 0.8325 -
0.2601 43316 0.8446 -
0.2651 44149 0.812 -
0.2701 44982 0.8246 -
0.2751 45815 0.8086 -
0.2801 46648 0.8553 -
0.2851 47481 0.8506 -
0.2901 48314 0.834 -
0.2951 49147 0.8313 -
0.3000 49962 - 0.5377
0.3001 49980 0.8376 -
0.3051 50813 0.7836 -
0.3101 51646 0.8089 -
0.3151 52479 0.8065 -
0.3201 53312 0.8284 -
0.3251 54145 0.7959 -
0.3301 54978 0.8332 -
0.3351 55811 0.7924 -
0.3401 56644 0.8171 -
0.3451 57477 0.7924 -
0.3501 58310 0.7977 -
0.3551 59143 0.7729 -
0.3601 59976 0.7617 -
0.3651 60809 0.8211 -
0.3701 61642 0.8497 -
0.3751 62475 0.8218 -
0.3802 63308 0.7846 -
0.3852 64141 0.7876 -
0.3902 64974 0.7912 -
0.3952 65807 0.7977 -
0.4000 66616 - 0.4974
0.4002 66640 0.8096 -
0.4052 67473 0.8356 -
0.4102 68306 0.788 -
0.4152 69139 0.7683 -
0.4202 69972 0.7358 -
0.4252 70805 0.7634 -
0.4302 71638 0.7535 -
0.4352 72471 0.756 -
0.4402 73304 0.7633 -
0.4452 74137 0.7509 -
0.4502 74970 0.7547 -
0.4552 75803 0.7539 -
0.4602 76636 0.7608 -
0.4652 77469 0.8262 -
0.4702 78302 0.8076 -
0.4752 79135 0.8179 -
0.4802 79968 0.7709 -
0.4852 80801 0.744 -
0.4902 81634 0.7846 -
0.4952 82467 0.7473 -
0.5000 83270 - 0.4776
0.5002 83300 0.7759 -
0.5052 84133 0.755 -
0.5102 84966 0.7308 -
0.5152 85799 0.7256 -
0.5202 86632 0.7703 -
0.5252 87465 0.7823 -
0.5302 88298 0.8109 -
0.5352 89131 0.7795 -
0.5402 89964 0.7833 -
0.5452 90797 0.7752 -
0.5502 91630 0.7975 -
0.5552 92463 0.7863 -
0.5602 93296 0.7337 -
0.5652 94129 0.7755 -
0.5702 94962 0.7928 -
0.5752 95795 0.7604 -
0.5802 96628 0.7983 -
0.5852 97461 0.7665 -
0.5902 98294 0.7749 -
0.5952 99127 0.7838 -
0.6000 99924 - 0.4669
0.6002 99960 0.7727 -
0.6052 100793 0.8049 -
0.6102 101626 0.7857 -
0.6152 102459 0.7622 -
0.6202 103292 0.8117 -
0.6252 104125 0.7711 -
0.6302 104958 0.7892 -
0.6353 105791 0.7938 -
0.6403 106624 0.728 -
0.6453 107457 0.7693 -
0.6503 108290 0.7875 -
0.6553 109123 0.7958 -
0.6603 109956 0.749 -
0.6653 110789 0.7788 -
0.6703 111622 0.7614 -
0.6753 112455 0.7577 -
0.6803 113288 0.7805 -
0.6853 114121 0.7677 -
0.6903 114954 0.7458 -
0.6953 115787 0.7962 -
0.7000 116578 - 0.4641
0.7003 116620 0.7275 -
0.7053 117453 0.7778 -
0.7103 118286 0.7885 -
0.7153 119119 0.8046 -
0.7203 119952 0.8222 -
0.7253 120785 0.7714 -
0.7303 121618 0.7983 -
0.7353 122451 0.7359 -
0.7403 123284 0.7618 -
0.7453 124117 0.783 -
0.7503 124950 0.763 -
0.7553 125783 0.809 -
0.7603 126616 0.794 -
0.7653 127449 0.7366 -
0.7703 128282 0.776 -
0.7753 129115 0.8053 -
0.7803 129948 0.7941 -
0.7853 130781 0.7722 -
0.7903 131614 0.7959 -
0.7953 132447 0.8061 -
0.8000 133232 - 0.4468

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yklymchuk-rztk/e5-3-test2

Finetuned
(39)
this model

Evaluation results