sirabhop's picture
Add new SentenceTransformer model.
9520ed6 verified
metadata
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:512
  - loss:TripletLoss
  - dataset_size:243060
widget:
  - source_sentence: ตะกร้าแมว
    sentences:
      - ตะกร้าหูหิ้วมีฝาปิดล็อคได้ ตะกร้าแมวเล็ก 15x23 ซม.
      - พูกันกลม ตราม้า No.10
      - 101259 - ปลั๊กแปลง 2 ขาแบน TOSHINO CO-6S ขาว
  - source_sentence: micropore
    sentences:
      - ดัชชี่โยเกิร์ตธรรมชาติ 135 X4
      - คาเมลถั่วผสมคอกเทล 150
      - 3M Nexcare เทปเยื่อกระดาษ Micropore 1 นิ้ว 10 หลา
  - source_sentence: เต้าหู้
    sentences:
      - หลอดแก๊สวิปครีม Quick whip กลิ่นช็อคโกแลต กล่อง
      - เต้าหู้แข็งสีขาว
      - ถาดหลุม(ใหญ่)อย่างหนา60หลุม
  - source_sentence: s26 gold
    sentences:
      - ชุดตรวจโควิด แบบแยงจมุก ยี่ห้อ diasia
      - เอส26โกลด์เอสเอ็มเอโปรซี 550
      - พริกขี้หนูแห้ง 1 ซอง
  - source_sentence: กาแฟพั
    sentences:
      - คุกกี้กาแฟ
      - พริกเขียวจินดา
      - AIR X MINT 10 TAB แอร์เอ็กซ์ เม็ดเคี้ยว รสมินต์

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sirabhop/mart-multilingual-semantic-search-miniLM-L12")
# Run inference
sentences = [
    'กาแฟพั',
    'คุกกี้กาแฟ',
    'พริกเขียวจินดา',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 243,060 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 5.92 tokens
    • max: 13 tokens
    • min: 4 tokens
    • mean: 12.48 tokens
    • max: 28 tokens
    • min: 4 tokens
    • mean: 14.61 tokens
    • max: 28 tokens
  • Samples:
    anchor positive negative
    ชุดต้มยำ ชุดต้มยำ แกสบี้รับเบอร์สไปค์กี้เอจด์15ก_4902806125962
    ไดร์เป่าผม 1169469 - ไดร์เป่าผม PHILIPS BHD300/10 1600วัตต์ Soji ปลอกนิ้วทำความสะอาดฟัน 50ชิ้น กลิ่นมิ้นท์
    นำ้ตาลทราย น้ำตาลทราย ตรามิตรผล 1 กก. ปลากระบอผสมไข่ สดมาก
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1054 100 0.5952
0.4219 200 0.5212
0.1054 100 0.3363
0.2107 200 0.3205
0.3161 300 0.3428
0.4215 400 0.3276
0.5269 500 0.3183
0.6322 600 0.3219
0.7376 700 0.3336
0.8430 800 0.3232
0.9484 900 0.3216

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.0.dev0
  • PyTorch: 2.0.1+cu118
  • Accelerate: 0.31.0
  • Datasets: 2.15.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}