SentenceTransformer based on ai-forever/ruRoberta-large

This is a sentence-transformers model finetuned from ai-forever/ruRoberta-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: ai-forever/ruRoberta-large
  • Maximum Sequence Length: 514 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'НЕТ ДО 20.04!!!!!!!! 12.01.16  Аллергокомпонент f77 - бета-лактоглобулин nBos d 5, IgE (ImmunoCAP)',
    'Панель аллергенов животных № 70 IgE (эпителий морской свинки, эпителий кролика, хомяк, крыса, мышь),',
    'Ультразвуковое исследование плода',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 19,383 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 5 tokens
    • mean: 30.0 tokens
    • max: 121 tokens
    • min: 5 tokens
    • mean: 30.73 tokens
    • max: 105 tokens
  • Samples:
    sentence_0 sentence_1
    Ингибитор VIII фактора Исследование уровня антигена фактора Виллебранда
    13.01.02 Антитела к экстрагируемому нуклеарному АГ (ЭНА/ENA-скрин), сыворотка крови Антитела к экстрагируемому ядерному антигену, кач.
    Нет 12.4.092 Аллерген f203 - фисташковые орехи, IgE Панель аллергенов деревьев № 2 IgE (клен ясенелистный, тополь, вяз, дуб, пекан),
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 11
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 11
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.1032 500 0.7937
0.2064 1000 0.5179
0.3095 1500 0.5271
0.4127 2000 0.5696
0.5159 2500 0.5232
0.6191 3000 0.6401
0.7222 3500 0.6337
0.8254 4000 0.9436
0.9286 4500 1.3872
1.0318 5000 1.3834
1.1350 5500 0.9831
1.2381 6000 1.0122
1.3413 6500 1.3708
1.4445 7000 1.3794
1.5477 7500 1.3784
1.6508 8000 1.3856
1.7540 8500 1.3809
1.8572 9000 1.3776
1.9604 9500 1.0041
2.0636 10000 0.8559
2.1667 10500 0.8531
2.2699 11000 0.8446
2.3731 11500 0.8487
2.4763 12000 1.0807
2.5794 12500 1.3792
2.6826 13000 1.3923
2.7858 13500 1.3787
2.8890 14000 1.3803
2.9922 14500 1.3641
3.0953 15000 1.3725
3.1985 15500 1.3624
3.3017 16000 1.3659
3.4049 16500 1.3609
3.5080 17000 1.3496
3.6112 17500 1.3639
3.7144 18000 1.3487
3.8176 18500 1.3463
3.9208 19000 1.336
4.0239 19500 1.3451
4.1271 20000 1.3363
4.2303 20500 1.3411
4.3335 21000 1.3376
4.4366 21500 1.3294
4.5398 22000 1.3281
4.6430 22500 1.3323
4.7462 23000 1.3411
4.8494 23500 1.3162
4.9525 24000 1.3204
5.0557 24500 1.324
5.1589 25000 1.3253
5.2621 25500 1.3283
5.3652 26000 1.3298
5.4684 26500 1.3144
5.5716 27000 1.3162
5.6748 27500 1.3148
5.7780 28000 1.3254
5.8811 28500 1.319
5.9843 29000 1.3134
6.0875 29500 1.3184
6.1907 30000 1.3049
6.2939 30500 1.3167
6.3970 31000 1.3192
6.5002 31500 1.2926
6.6034 32000 1.3035
6.7066 32500 1.3117
6.8097 33000 1.3093
6.9129 33500 1.278
7.0161 34000 1.3143
7.1193 34500 1.3144
7.2225 35000 1.304
7.3256 35500 1.3066
7.4288 36000 1.2916
7.5320 36500 1.2943
7.6352 37000 1.2883
7.7383 37500 1.3014
7.8415 38000 1.3005
7.9447 38500 1.2699
8.0479 39000 1.3042
8.1511 39500 1.289
8.2542 40000 1.3012
8.3574 40500 1.3017
8.4606 41000 1.272
8.5638 41500 1.2939
8.6669 42000 1.2764
8.7701 42500 1.2908
8.8733 43000 1.2619
8.9765 43500 1.2791
9.0797 44000 1.2722
9.1828 44500 1.278
9.2860 45000 1.2911
9.3892 45500 1.2791
9.4924 46000 1.2791
9.5955 46500 1.2782
9.6987 47000 1.2789
9.8019 47500 1.2858
9.9051 48000 1.2601
10.0083 48500 1.29
10.1114 49000 1.276
10.2146 49500 1.2801
10.3178 50000 1.2853
10.4210 50500 1.2655
10.5241 51000 1.271
10.6273 51500 1.2633
10.7305 52000 1.2565
10.8337 52500 1.2755
10.9369 53000 1.2567

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
15
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nanalysenko/panacea_v2.2

Finetuned
(13)
this model