---
language:
- ar
- bg
- de
- el
- en
- es
- fr
- hi
- ru
- sw
- th
- tr
- ur
- vi
- zh
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:388774
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BookingCare/bkcare-bert-pretrained
datasets:
- facebook/xnli
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
widget:
- source_sentence: Như bằng chứng về việc này , cô ta đã chi tiết các tài sản bầu
    cử của clinton theo tiểu bang , ở phía đông bắc , Trung Tây , và tây .
  sentences:
  - Bộ chọn ứng cử viên không vui chơi ở các bữa tiệc .
  - Sử dụng công nghệ thông tin cho phép sử dụng các nguồn tài nguyên liên lạc lớn
    hơn .
  - Không bao giờ có một tài khoản kỹ lưỡng của các cuộc bầu cử của clinton .
- source_sentence: Sau một thời gian , ông ấy ngừng bò và ngồi lên .
  sentences:
  - Jon muốn có một trận đấu lớn để bắt đầu .
  - Tất cả mọi người đều được đưa ra một tách trung quốc vào đầu năm .
  - Anh ta bị thương nghiêm trọng .
- source_sentence: Arras đã nổi tiếng trong thời trung cổ cho tác phẩm của vải và
    những tấm thảm treo cổ , loại thông qua mà polonius gặp phải cái chết của ông
    ta ở hamlet .
  sentences:
  - Lũ lụt đang dự kiến đã gây ra 1.5 tỷ đô la trong thiệt hại .
  - Nó sẽ là bắt buộc cho những người nghèo khổ vì những quy định .
  - Arras chỉ làm đồ gốm thôi .
- source_sentence: Lehrer là người về sự giao tiếp này với gió và quyền lực , và nó
    đã biến anh ta thành một trong số họ .
  sentences:
  - Người đã làm julius cảm thấy lo lắng .
  - Họ có thể mất 36 tháng để hoàn thành .
  - Leher không thích giao tiếp với các chính trị gia .
- source_sentence: Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .
  sentences:
  - Tôi sẽ ban cho anh những lời chúc của anh , julius bỏ súng xuống .
  - Bạn có thể được đề nghị giả ngọc , điều đó rất tương tự với các đối tác cao hơn
    của nó .
  - Nó đến trong túi 400 pound .
pipeline_tag: sentence-similarity
model-index:
- name: SentenceTransformer based on BookingCare/bkcare-bert-pretrained
  results:
  - task:
      type: semantic-similarity
      name: Semantic Similarity
    dataset:
      name: sts dev 768
      type: sts-dev-768
    metrics:
    - type: pearson_cosine
      value: 0.6867482534374487
      name: Pearson Cosine
    - type: spearman_cosine
      value: 0.6700553964995389
      name: Spearman Cosine
    - type: pearson_manhattan
      value: 0.6734129943367082
      name: Pearson Manhattan
    - type: spearman_manhattan
      value: 0.6689701652447698
      name: Spearman Manhattan
    - type: pearson_euclidean
      value: 0.6743893025028618
      name: Pearson Euclidean
    - type: spearman_euclidean
      value: 0.6700560677966448
      name: Spearman Euclidean
    - type: pearson_dot
      value: 0.6867482521687218
      name: Pearson Dot
    - type: spearman_dot
      value: 0.6700558146434896
      name: Spearman Dot
    - type: pearson_max
      value: 0.6867482534374487
      name: Pearson Max
    - type: spearman_max
      value: 0.6700560677966448
      name: Spearman Max
  - task:
      type: semantic-similarity
      name: Semantic Similarity
    dataset:
      name: sts dev 512
      type: sts-dev-512
    metrics:
    - type: pearson_cosine
      value: 0.6850905517919458
      name: Pearson Cosine
    - type: spearman_cosine
      value: 0.6685671393301956
      name: Spearman Cosine
    - type: pearson_manhattan
      value: 0.6726989775543833
      name: Pearson Manhattan
    - type: spearman_manhattan
      value: 0.6682515030981849
      name: Spearman Manhattan
    - type: pearson_euclidean
      value: 0.6739395873419184
      name: Pearson Euclidean
    - type: spearman_euclidean
      value: 0.6695224924884773
      name: Spearman Euclidean
    - type: pearson_dot
      value: 0.6802500913119895
      name: Pearson Dot
    - type: spearman_dot
      value: 0.6631065723741826
      name: Spearman Dot
    - type: pearson_max
      value: 0.6850905517919458
      name: Pearson Max
    - type: spearman_max
      value: 0.6695224924884773
      name: Spearman Max
  - task:
      type: semantic-similarity
      name: Semantic Similarity
    dataset:
      name: sts dev 256
      type: sts-dev-256
    metrics:
    - type: pearson_cosine
      value: 0.6725154983351178
      name: Pearson Cosine
    - type: spearman_cosine
      value: 0.6575647130100782
      name: Spearman Cosine
    - type: pearson_manhattan
      value: 0.6697743652714089
      name: Pearson Manhattan
    - type: spearman_manhattan
      value: 0.6645201863227755
      name: Spearman Manhattan
    - type: pearson_euclidean
      value: 0.6719730940115203
      name: Pearson Euclidean
    - type: spearman_euclidean
      value: 0.6669909427123673
      name: Spearman Euclidean
    - type: pearson_dot
      value: 0.6475732494643994
      name: Pearson Dot
    - type: spearman_dot
      value: 0.6294359395183124
      name: Spearman Dot
    - type: pearson_max
      value: 0.6725154983351178
      name: Pearson Max
    - type: spearman_max
      value: 0.6669909427123673
      name: Spearman Max
---

# SentenceTransformer based on BookingCare/bkcare-bert-pretrained

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BookingCare/bkcare-bert-pretrained](https://huggingface.co/BookingCare/bkcare-bert-pretrained) on the [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BookingCare/bkcare-bert-pretrained](https://huggingface.co/BookingCare/bkcare-bert-pretrained) <!-- at revision f869851286af65b3dbe0541a14fc5d3d2bb6c95d -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
    - [facebook/xnli](https://huggingface.co/datasets/facebook/xnli)
- **Languages:** ar, bg, de, el, en, es, fr, hi, ru, sw, th, tr, ur, vi, zh
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("matryoshka_nli_BookingCare-bkcare-bert-pretrained-2024-07-19_04-21-48")
# Run inference
sentences = [
    'Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .',
    'Tôi sẽ ban cho anh những lời chúc của anh , julius bỏ súng xuống .',
    'Nó đến trong túi 400 pound .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Semantic Similarity
* Dataset: `sts-dev-768`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.6867     |
| **spearman_cosine** | **0.6701** |
| pearson_manhattan   | 0.6734     |
| spearman_manhattan  | 0.669      |
| pearson_euclidean   | 0.6744     |
| spearman_euclidean  | 0.6701     |
| pearson_dot         | 0.6867     |
| spearman_dot        | 0.6701     |
| pearson_max         | 0.6867     |
| spearman_max        | 0.6701     |

#### Semantic Similarity
* Dataset: `sts-dev-512`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.6851     |
| **spearman_cosine** | **0.6686** |
| pearson_manhattan   | 0.6727     |
| spearman_manhattan  | 0.6683     |
| pearson_euclidean   | 0.6739     |
| spearman_euclidean  | 0.6695     |
| pearson_dot         | 0.6803     |
| spearman_dot        | 0.6631     |
| pearson_max         | 0.6851     |
| spearman_max        | 0.6695     |

#### Semantic Similarity
* Dataset: `sts-dev-256`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.6725     |
| **spearman_cosine** | **0.6576** |
| pearson_manhattan   | 0.6698     |
| spearman_manhattan  | 0.6645     |
| pearson_euclidean   | 0.672      |
| spearman_euclidean  | 0.667      |
| pearson_dot         | 0.6476     |
| spearman_dot        | 0.6294     |
| pearson_max         | 0.6725     |
| spearman_max        | 0.667      |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### facebook/xnli

* Dataset: [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) at [b8dd5d7](https://huggingface.co/datasets/facebook/xnli/tree/b8dd5d7af51114dbda02c0e3f6133f332186418e)
* Size: 388,774 training samples
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | premise                                                                            | hypothesis                                                                        | label                                                              |
  |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                            | int                                                                |
  | details | <ul><li>min: 3 tokens</li><li>mean: 29.98 tokens</li><li>max: 309 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 15.64 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>0: ~33.10%</li><li>1: ~35.60%</li><li>2: ~31.30%</li></ul> |
* Samples:
  | premise                                                                                                                                                                                   | hypothesis                                                                                                                                | label          |
  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
  | <code>Những rắc rối với loại phân tích chi tiết này có nghĩa là bất kỳ nghệ nhân nào có thể nghiên cứu kỹ thuật của người nghệ thuật và tái tạo chúng -- sự chuẩn bị của hoffman .</code> | <code>Sự tái tạo là một quá trình dễ dàng .</code>                                                                                        | <code>2</code> |
  | <code>Đó là một sự quan sát tỉnh rượu , để nhận ra rằng 80 phần trăm của những người cần sự giúp đỡ pháp lý bị từ chối những hướng dẫn và luật sự .</code>                                | <code>80 % những người cần sự trợ giúp pháp lý bị từ chối những hướng dẫn mà họ đang tìm kiếm , và đây là một suy nghĩ tỉnh rượu .</code> | <code>0</code> |
  | <code>Đi qua cái để tìm nhà thờ của những hình xăm egios .</code>                                                                                                                         | <code>Nếu anh đi qua cái , anh sẽ tìm thấy mình ở bờ vực của thị trấn , không có gì ngoài nông thôn bên kia .</code>                      | <code>2</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256
      ],
      "matryoshka_weights": [
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Evaluation Dataset

#### facebook/xnli

* Dataset: [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) at [b8dd5d7](https://huggingface.co/datasets/facebook/xnli/tree/b8dd5d7af51114dbda02c0e3f6133f332186418e)
* Size: 3,928 evaluation samples
* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | premise                                                                           | hypothesis                                                                        | label                                                              |
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
  | type    | string                                                                            | string                                                                            | int                                                                |
  | details | <ul><li>min: 4 tokens</li><li>mean: 32.3 tokens</li><li>max: 163 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 15.73 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>0: ~32.40%</li><li>1: ~33.50%</li><li>2: ~34.10%</li></ul> |
* Samples:
  | premise                                                                                                                    | hypothesis                                                                             | label          |
  |:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------|
  | <code>Hai xu mắt anh ta warily .</code>                                                                                    | <code>Hai xu không nhìn anh ta .</code>                                                | <code>2</code> |
  | <code>Một không khí chung của glee permeated tất cả mọi người .</code>                                                     | <code>Mọi thứ đều cảm thấy hạnh phúc .</code>                                          | <code>0</code> |
  | <code>Tuy nhiên , một sự chắc chắn là dân số hoa kỳ đã bị lão hóa và sẽ có ít công nhân hỗ trợ mỗi người nghỉ hưu .</code> | <code>Trạng Thái lão hóa của dân số hoa kỳ được coi là một sự không chắc chắn .</code> | <code>2</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256
      ],
      "matryoshka_weights": [
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `learning_rate`: 2e-05
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `fp16`: True
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss | loss   | sts-dev-256_spearman_cosine | sts-dev-512_spearman_cosine | sts-dev-768_spearman_cosine |
|:------:|:----:|:-------------:|:------:|:---------------------------:|:---------------------------:|:---------------------------:|
| 0      | 0    | -             | -      | 0.5425                      | 0.5569                      | 0.5593                      |
| 0.0494 | 300  | 5.6741        | -      | -                           | -                           | -                           |
| 0.0823 | 500  | -             | 2.9876 | 0.6417                      | 0.6479                      | 0.6502                      |
| 0.0988 | 600  | 3.5541        | -      | -                           | -                           | -                           |
| 0.1481 | 900  | 2.9032        | -      | -                           | -                           | -                           |
| 0.1646 | 1000 | -             | 2.3400 | 0.6526                      | 0.6565                      | 0.6591                      |
| 0.1975 | 1200 | 2.6495        | -      | -                           | -                           | -                           |
| 0.2469 | 1500 | 2.426         | 2.1092 | 0.6359                      | 0.6466                      | 0.6501                      |
| 0.2963 | 1800 | 2.2969        | -      | -                           | -                           | -                           |
| 0.3292 | 2000 | -             | 1.9556 | 0.6390                      | 0.6491                      | 0.6516                      |
| 0.3457 | 2100 | 2.1003        | -      | -                           | -                           | -                           |
| 0.3951 | 2400 | 2.0975        | -      | -                           | -                           | -                           |
| 0.4115 | 2500 | -             | 1.8133 | 0.6585                      | 0.6681                      | 0.6709                      |
| 0.4444 | 2700 | 2.0403        | -      | -                           | -                           | -                           |
| 0.4938 | 3000 | 1.9421        | 1.7629 | 0.6415                      | 0.6515                      | 0.6540                      |
| 0.5432 | 3300 | 1.9313        | -      | -                           | -                           | -                           |
| 0.5761 | 3500 | -             | 1.6924 | 0.6577                      | 0.6660                      | 0.6673                      |
| 0.5926 | 3600 | 1.8582        | -      | -                           | -                           | -                           |
| 0.6420 | 3900 | 1.8203        | -      | -                           | -                           | -                           |
| 0.6584 | 4000 | -             | 1.6263 | 0.6527                      | 0.6620                      | 0.6635                      |
| 0.6914 | 4200 | 1.8281        | -      | -                           | -                           | -                           |
| 0.7407 | 4500 | 1.8037        | 1.5776 | 0.6572                      | 0.6677                      | 0.6685                      |
| 0.7901 | 4800 | 1.7771        | -      | -                           | -                           | -                           |
| 0.8230 | 5000 | -             | 1.5571 | 0.6548                      | 0.6652                      | 0.6665                      |
| 0.8395 | 5100 | 1.7427        | -      | -                           | -                           | -                           |
| 0.8889 | 5400 | 1.6901        | -      | -                           | -                           | -                           |
| 0.9053 | 5500 | -             | 1.5385 | 0.6604                      | 0.6707                      | 0.6717                      |
| 0.9383 | 5700 | 1.7977        | -      | -                           | -                           | -                           |
| 0.9877 | 6000 | 1.6838        | 1.5279 | 0.6576                      | 0.6686                      | 0.6701                      |


### Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.1.2
- Accelerate: 0.30.1
- Datasets: 2.19.2
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->