SentenceTransformer based on distilbert/distilbert-base-multilingual-cased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("agentlans/distilbert-base-multilingual-cased-aligned")
# Run inference
sentences = [
    'Palm DOC Conduit for KPilot',
    'PalmDOC- conduit foar KPilot',
    'Man nepatinka gyventi kaime.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 867,042 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 21.88 tokens
    • max: 121 tokens
    • min: 3 tokens
    • mean: 30.11 tokens
    • max: 230 tokens
  • Samples:
    sentence_0 sentence_1
    They need to be internationally recognized and supported. Mereka harus diakui dan dibantu secara internasional.
    I ride with these kids once a week, every Tuesday. Ik rijd met deze kinderen een keer per week, elke dinsdag.
    We still have some. අපි ගාව තව ඒවා තියෙනවනේ.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0046 500 0.1996
0.0092 1000 0.087
0.0138 1500 0.0771
0.0185 2000 0.0646
0.0231 2500 0.0443
0.0277 3000 0.0526
0.0323 3500 0.05
0.0369 4000 0.0479
0.0415 4500 0.0477
0.0461 5000 0.0427
0.0507 5500 0.0343
0.0554 6000 0.0358
0.0600 6500 0.0452
0.0646 7000 0.0397
0.0692 7500 0.0289
0.0738 8000 0.0274
0.0784 8500 0.0364
0.0830 9000 0.0283
0.0877 9500 0.0295
0.0923 10000 0.0337
0.0969 10500 0.0303
0.1015 11000 0.0252
0.1061 11500 0.0241
0.1107 12000 0.0225
0.1153 12500 0.0263
0.1199 13000 0.0255
0.1246 13500 0.0311
0.1292 14000 0.0201
0.1338 14500 0.0209
0.1384 15000 0.0205
0.1430 15500 0.0242
0.1476 16000 0.0332
0.1522 16500 0.0346
0.1569 17000 0.0225
0.1615 17500 0.0245
0.1661 18000 0.0166
0.1707 18500 0.0196
0.1753 19000 0.0264
0.1799 19500 0.0212
0.1845 20000 0.0201
0.1891 20500 0.0238
0.1938 21000 0.0175
0.1984 21500 0.022
0.2030 22000 0.0201
0.2076 22500 0.0197
0.2122 23000 0.0137
0.2168 23500 0.017
0.2214 24000 0.031
0.2261 24500 0.0238
0.2307 25000 0.0194
0.2353 25500 0.024
0.2399 26000 0.022
0.2445 26500 0.0276
0.2491 27000 0.016
0.2537 27500 0.0203
0.2583 28000 0.0245
0.2630 28500 0.0161
0.2676 29000 0.0132
0.2722 29500 0.0142
0.2768 30000 0.0171
0.2814 30500 0.0207
0.2860 31000 0.0189
0.2906 31500 0.0169
0.2953 32000 0.0225
0.2999 32500 0.0224
0.3045 33000 0.0114
0.3091 33500 0.0213
0.3137 34000 0.0146
0.3183 34500 0.0154
0.3229 35000 0.0218
0.3275 35500 0.0096
0.3322 36000 0.0147
0.3368 36500 0.0186
0.3414 37000 0.0214
0.3460 37500 0.0231
0.3506 38000 0.0165
0.3552 38500 0.0157
0.3598 39000 0.0128
0.3645 39500 0.018
0.3691 40000 0.0183
0.3737 40500 0.0203
0.3783 41000 0.02
0.3829 41500 0.0165
0.3875 42000 0.0128
0.3921 42500 0.0106
0.3967 43000 0.0174
0.4014 43500 0.0168
0.4060 44000 0.0114
0.4106 44500 0.0158
0.4152 45000 0.0108
0.4198 45500 0.0141
0.4244 46000 0.0137
0.4290 46500 0.0137
0.4337 47000 0.0215
0.4383 47500 0.0123
0.4429 48000 0.0138
0.4475 48500 0.0152
0.4521 49000 0.0144
0.4567 49500 0.016
0.4613 50000 0.0132
0.4659 50500 0.0164
0.4706 51000 0.0155
0.4752 51500 0.0145
0.4798 52000 0.0173
0.4844 52500 0.02
0.4890 53000 0.0168
0.4936 53500 0.011
0.4982 54000 0.0116
0.5029 54500 0.009
0.5075 55000 0.0143
0.5121 55500 0.0111
0.5167 56000 0.0138
0.5213 56500 0.0104
0.5259 57000 0.0146
0.5305 57500 0.0116
0.5351 58000 0.0157
0.5398 58500 0.013
0.5444 59000 0.0144
0.5490 59500 0.0134
0.5536 60000 0.0114
0.5582 60500 0.0101
0.5628 61000 0.0164
0.5674 61500 0.0151
0.5721 62000 0.0138
0.5767 62500 0.0107
0.5813 63000 0.0102
0.5859 63500 0.0153
0.5905 64000 0.0103
0.5951 64500 0.0136
0.5997 65000 0.0107
0.6043 65500 0.0101
0.6090 66000 0.0101
0.6136 66500 0.0117
0.6182 67000 0.0113
0.6228 67500 0.0131
0.6274 68000 0.0068
0.6320 68500 0.0053
0.6366 69000 0.0113
0.6413 69500 0.0119
0.6459 70000 0.0094
0.6505 70500 0.0072
0.6551 71000 0.0171
0.6597 71500 0.0121
0.6643 72000 0.0134
0.6689 72500 0.0147
0.6735 73000 0.0075
0.6782 73500 0.0125
0.6828 74000 0.0064
0.6874 74500 0.0071
0.6920 75000 0.0073
0.6966 75500 0.0075
0.7012 76000 0.0097
0.7058 76500 0.01
0.7105 77000 0.0123
0.7151 77500 0.0093
0.7197 78000 0.0103
0.7243 78500 0.0179
0.7289 79000 0.0091
0.7335 79500 0.0121
0.7381 80000 0.0104
0.7428 80500 0.0083
0.7474 81000 0.0116
0.7520 81500 0.0084
0.7566 82000 0.0077
0.7612 82500 0.0081
0.7658 83000 0.0101
0.7704 83500 0.0093
0.7750 84000 0.0095
0.7797 84500 0.0107
0.7843 85000 0.0108
0.7889 85500 0.0095
0.7935 86000 0.0082
0.7981 86500 0.0103
0.8027 87000 0.0069
0.8073 87500 0.009
0.8120 88000 0.0081
0.8166 88500 0.0074
0.8212 89000 0.0069
0.8258 89500 0.0066
0.8304 90000 0.0065
0.8350 90500 0.0065
0.8396 91000 0.0088
0.8442 91500 0.008
0.8489 92000 0.0069
0.8535 92500 0.0095
0.8581 93000 0.0082
0.8627 93500 0.0068
0.8673 94000 0.006
0.8719 94500 0.0082
0.8765 95000 0.0121
0.8812 95500 0.0098
0.8858 96000 0.0083
0.8904 96500 0.008
0.8950 97000 0.0053
0.8996 97500 0.0102
0.9042 98000 0.0093
0.9088 98500 0.0042
0.9134 99000 0.0093
0.9181 99500 0.0138
0.9227 100000 0.0105
0.9273 100500 0.0079
0.9319 101000 0.0118
0.9365 101500 0.0072
0.9411 102000 0.0094
0.9457 102500 0.0108
0.9504 103000 0.0092
0.9550 103500 0.0062
0.9596 104000 0.0073
0.9642 104500 0.0089
0.9688 105000 0.0092
0.9734 105500 0.0076
0.9780 106000 0.0103
0.9826 106500 0.0064
0.9873 107000 0.0072
0.9919 107500 0.0052
0.9965 108000 0.0061

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for agentlans/distilbert-base-multilingual-cased-aligned

Finetuned
(233)
this model