x1saint commited on
Commit
c8caa28
·
verified ·
1 Parent(s): c60d7f9

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,461 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - tr
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:482091
10
+ - loss:MultipleNegativesRankingLoss
11
+ base_model: Alibaba-NLP/gte-multilingual-base
12
+ widget:
13
+ - source_sentence: Ya da dışarı çıkıp yürü ya da biraz koşun. Bunu düzenli olarak
14
+ yapmıyorum ama Washington bunu yapmak için harika bir yer.
15
+ sentences:
16
+ - “Washington's yürüyüş ya da koşu için harika bir yer.”
17
+ - H-2A uzaylılar Amerika Birleşik Devletleri'nde zaman kısa süreleri var.
18
+ - “Washington'da düzenli olarak yürüyüşe ya da koşuya çıkıyorum.”
19
+ - source_sentence: Orta yaylalar ve güney kıyıları arasındaki kontrast daha belirgin
20
+ olamazdı.
21
+ sentences:
22
+ - İşitme Yardımı Uyumluluğu Müzakere Kuralları Komitesi, Federal İletişim Komisyonu'nun
23
+ bir ürünüdür.
24
+ - Dağlık ve sahil arasındaki kontrast kolayca işaretlendi.
25
+ - Kontrast işaretlenemedi.
26
+ - source_sentence: Bir 1997 Henry J. Kaiser Aile Vakfı anket yönetilen bakım planlarında
27
+ Amerikalılar temelde kendi bakımı ile memnun olduğunu bulundu.
28
+ sentences:
29
+ - Kaplanları takip ederken çok sessiz olmalısın.
30
+ - Henry Kaiser vakfı insanların sağlık hizmetlerinden hoşlandığını gösteriyor.
31
+ - Henry Kaiser Vakfı insanların sağlık hizmetlerinden nefret ettiğini gösteriyor.
32
+ - source_sentence: Eminim yapmışlardır.
33
+ sentences:
34
+ - Eminim öyle yapmışlardır.
35
+ - Batı Teksas'ta 100 10 dereceydi.
36
+ - Eminim yapmamışlardır.
37
+ - source_sentence: Ve gerçekten, baba haklıydı, oğlu zaten her şeyi tecrübe etmişti,
38
+ her şeyi denedi ve daha az ilgileniyordu.
39
+ sentences:
40
+ - Oğlu her şeye olan ilgisini kaybediyordu.
41
+ - Pek bir şey yapmadım.
42
+ - Baba oğlunun tecrübe için hala çok şey olduğunu biliyordu.
43
+ datasets:
44
+ - emrecan/all-nli-tr
45
+ pipeline_tag: sentence-similarity
46
+ library_name: sentence-transformers
47
+ metrics:
48
+ - cosine_accuracy
49
+ model-index:
50
+ - name: SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
51
+ results:
52
+ - task:
53
+ type: triplet
54
+ name: Triplet
55
+ dataset:
56
+ name: all nli dev
57
+ type: all-nli-dev
58
+ metrics:
59
+ - type: cosine_accuracy
60
+ value: 0.943809986114502
61
+ name: Cosine Accuracy
62
+ ---
63
+
64
+ # SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
65
+
66
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) on the [all-nli-tr](https://huggingface.co/datasets/emrecan/all-nli-tr) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
67
+
68
+ ## Model Details
69
+
70
+ ### Model Description
71
+ - **Model Type:** Sentence Transformer
72
+ - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) <!-- at revision ca1791e0bcc104f6db161f27de1340241b13c5a4 -->
73
+ - **Maximum Sequence Length:** 8192 tokens
74
+ - **Output Dimensionality:** 768 dimensions
75
+ - **Similarity Function:** Cosine Similarity
76
+ - **Training Dataset:**
77
+ - [all-nli-tr](https://huggingface.co/datasets/emrecan/all-nli-tr)
78
+ - **Language:** tr
79
+ <!-- - **License:** Unknown -->
80
+
81
+ ### Model Sources
82
+
83
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
84
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
85
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
86
+
87
+ ### Full Model Architecture
88
+
89
+ ```
90
+ SentenceTransformer(
91
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
92
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
93
+ (2): Normalize()
94
+ )
95
+ ```
96
+
97
+ ## Usage
98
+
99
+ ### Direct Usage (Sentence Transformers)
100
+
101
+ First install the Sentence Transformers library:
102
+
103
+ ```bash
104
+ pip install -U sentence-transformers
105
+ ```
106
+
107
+ Then you can load this model and run inference.
108
+ ```python
109
+ from sentence_transformers import SentenceTransformer
110
+
111
+ # Download from the 🤗 Hub
112
+ model = SentenceTransformer("x1saint/gte-multi-triplet-v2")
113
+ # Run inference
114
+ sentences = [
115
+ 'Ve gerçekten, baba haklıydı, oğlu zaten her şeyi tecrübe etmişti, her şeyi denedi ve daha az ilgileniyordu.',
116
+ 'Oğlu her şeye olan ilgisini kaybediyordu.',
117
+ 'Baba oğlunun tecrübe için hala çok şey olduğunu biliyordu.',
118
+ ]
119
+ embeddings = model.encode(sentences)
120
+ print(embeddings.shape)
121
+ # [3, 768]
122
+
123
+ # Get the similarity scores for the embeddings
124
+ similarities = model.similarity(embeddings, embeddings)
125
+ print(similarities.shape)
126
+ # [3, 3]
127
+ ```
128
+
129
+ <!--
130
+ ### Direct Usage (Transformers)
131
+
132
+ <details><summary>Click to see the direct usage in Transformers</summary>
133
+
134
+ </details>
135
+ -->
136
+
137
+ <!--
138
+ ### Downstream Usage (Sentence Transformers)
139
+
140
+ You can finetune this model on your own dataset.
141
+
142
+ <details><summary>Click to expand</summary>
143
+
144
+ </details>
145
+ -->
146
+
147
+ <!--
148
+ ### Out-of-Scope Use
149
+
150
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
151
+ -->
152
+
153
+ ## Evaluation
154
+
155
+ ### Metrics
156
+
157
+ #### Triplet
158
+
159
+ * Dataset: `all-nli-dev`
160
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
161
+
162
+ | Metric | Value |
163
+ |:--------------------|:-----------|
164
+ | **cosine_accuracy** | **0.9438** |
165
+
166
+ <!--
167
+ ## Bias, Risks and Limitations
168
+
169
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
170
+ -->
171
+
172
+ <!--
173
+ ### Recommendations
174
+
175
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
176
+ -->
177
+
178
+ ## Training Details
179
+
180
+ ### Training Dataset
181
+
182
+ #### all-nli-tr
183
+
184
+ * Dataset: [all-nli-tr](https://huggingface.co/datasets/emrecan/all-nli-tr) at [daeabfb](https://huggingface.co/datasets/emrecan/all-nli-tr/tree/daeabfbc01f82757ab998bd23ce0ddfceaa5e24d)
185
+ * Size: 482,091 training samples
186
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
187
+ * Approximate statistics based on the first 1000 samples:
188
+ | | anchor | positive | negative |
189
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
190
+ | type | string | string | string |
191
+ | details | <ul><li>min: 5 tokens</li><li>mean: 28.16 tokens</li><li>max: 151 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 15.14 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.33 tokens</li><li>max: 55 tokens</li></ul> |
192
+ * Samples:
193
+ | anchor | positive | negative |
194
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------|:---------------------------------------------------------------------------|
195
+ | <code>Mevsim boyunca ve sanırım senin seviyendeyken onları bir sonraki seviyeye düşürürsün. Eğer ebeveyn takımını çağırmaya karar verirlerse Braves üçlü A'dan birini çağırmaya karar verirlerse çifte bir adam onun yerine geçmeye gider ve bekar bir adam gelir.</code> | <code>Eğer insanlar hatırlarsa, bir sonraki seviyeye düşersin.</code> | <code>Hiçbir şeyi hatırlamazlar.</code> |
196
+ | <code>Numaramızdan biri talimatlarınızı birazdan yerine getirecektir.</code> | <code>Ekibimin bir üyesi emirlerinizi büyük bir hassasiyetle yerine getirecektir.</code> | <code>Şu anda boş kimsek yok, bu yüzden sen de harekete geçmelisin.</code> |
197
+ | <code>Bunu nereden biliyorsun? Bütün bunlar yine onların bilgileri.</code> | <code>Bu bilgi onlara ait.</code> | <code>Hiçbir bilgileri yok.</code> |
198
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
199
+ ```json
200
+ {
201
+ "scale": 20.0,
202
+ "similarity_fct": "cos_sim"
203
+ }
204
+ ```
205
+
206
+ ### Evaluation Dataset
207
+
208
+ #### all-nli-tr
209
+
210
+ * Dataset: [all-nli-tr](https://huggingface.co/datasets/emrecan/all-nli-tr) at [daeabfb](https://huggingface.co/datasets/emrecan/all-nli-tr/tree/daeabfbc01f82757ab998bd23ce0ddfceaa5e24d)
211
+ * Size: 6,567 evaluation samples
212
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
213
+ * Approximate statistics based on the first 1000 samples:
214
+ | | anchor | positive | negative |
215
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
216
+ | type | string | string | string |
217
+ | details | <ul><li>min: 3 tokens</li><li>mean: 26.66 tokens</li><li>max: 121 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 14.98 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.4 tokens</li><li>max: 37 tokens</li></ul> |
218
+ * Samples:
219
+ | anchor | positive | negative |
220
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------|
221
+ | <code>Bilemiyorum. Onunla ilgili karışık duygularım var. Bazen ondan hoşlanıyorum ama aynı zamanda birisinin onu dövmesini görmeyi seviyorum.</code> | <code>Çoğunlukla ondan hoşlanıyorum, ama yine de birinin onu dövdüğünü görmekten zevk alıyorum.</code> | <code>O benim favorim ve kimsenin onu yendiğini görmek istemiyorum.</code> |
222
+ | <code>Sen ve arkadaşların burada hoş karşılanmaz, Severn söyledi.</code> | <code>Severn orada insanların hoş karşılanmadığını söyledi.</code> | <code>Severn orada insanların her zaman hoş karşılanacağını söyledi.</code> |
223
+ | <code>Gecenin en aşağısı ne olduğundan emin değilim.</code> | <code>Dün gece ne kadar soğuk oldu bilmiyorum.</code> | <code>Dün gece hava 37 dereceydi.</code> |
224
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
225
+ ```json
226
+ {
227
+ "scale": 20.0,
228
+ "similarity_fct": "cos_sim"
229
+ }
230
+ ```
231
+
232
+ ### Training Hyperparameters
233
+ #### Non-Default Hyperparameters
234
+
235
+ - `eval_strategy`: steps
236
+ - `per_device_train_batch_size`: 64
237
+ - `per_device_eval_batch_size`: 64
238
+ - `gradient_accumulation_steps`: 4
239
+ - `num_train_epochs`: 10
240
+ - `warmup_ratio`: 0.1
241
+ - `bf16`: True
242
+ - `dataloader_num_workers`: 4
243
+
244
+ #### All Hyperparameters
245
+ <details><summary>Click to expand</summary>
246
+
247
+ - `overwrite_output_dir`: False
248
+ - `do_predict`: False
249
+ - `eval_strategy`: steps
250
+ - `prediction_loss_only`: True
251
+ - `per_device_train_batch_size`: 64
252
+ - `per_device_eval_batch_size`: 64
253
+ - `per_gpu_train_batch_size`: None
254
+ - `per_gpu_eval_batch_size`: None
255
+ - `gradient_accumulation_steps`: 4
256
+ - `eval_accumulation_steps`: None
257
+ - `torch_empty_cache_steps`: None
258
+ - `learning_rate`: 5e-05
259
+ - `weight_decay`: 0.0
260
+ - `adam_beta1`: 0.9
261
+ - `adam_beta2`: 0.999
262
+ - `adam_epsilon`: 1e-08
263
+ - `max_grad_norm`: 1.0
264
+ - `num_train_epochs`: 10
265
+ - `max_steps`: -1
266
+ - `lr_scheduler_type`: linear
267
+ - `lr_scheduler_kwargs`: {}
268
+ - `warmup_ratio`: 0.1
269
+ - `warmup_steps`: 0
270
+ - `log_level`: passive
271
+ - `log_level_replica`: warning
272
+ - `log_on_each_node`: True
273
+ - `logging_nan_inf_filter`: True
274
+ - `save_safetensors`: True
275
+ - `save_on_each_node`: False
276
+ - `save_only_model`: False
277
+ - `restore_callback_states_from_checkpoint`: False
278
+ - `no_cuda`: False
279
+ - `use_cpu`: False
280
+ - `use_mps_device`: False
281
+ - `seed`: 42
282
+ - `data_seed`: None
283
+ - `jit_mode_eval`: False
284
+ - `use_ipex`: False
285
+ - `bf16`: True
286
+ - `fp16`: False
287
+ - `fp16_opt_level`: O1
288
+ - `half_precision_backend`: auto
289
+ - `bf16_full_eval`: False
290
+ - `fp16_full_eval`: False
291
+ - `tf32`: None
292
+ - `local_rank`: 0
293
+ - `ddp_backend`: None
294
+ - `tpu_num_cores`: None
295
+ - `tpu_metrics_debug`: False
296
+ - `debug`: []
297
+ - `dataloader_drop_last`: False
298
+ - `dataloader_num_workers`: 4
299
+ - `dataloader_prefetch_factor`: None
300
+ - `past_index`: -1
301
+ - `disable_tqdm`: False
302
+ - `remove_unused_columns`: True
303
+ - `label_names`: None
304
+ - `load_best_model_at_end`: False
305
+ - `ignore_data_skip`: False
306
+ - `fsdp`: []
307
+ - `fsdp_min_num_params`: 0
308
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
309
+ - `fsdp_transformer_layer_cls_to_wrap`: None
310
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
311
+ - `deepspeed`: None
312
+ - `label_smoothing_factor`: 0.0
313
+ - `optim`: adamw_torch
314
+ - `optim_args`: None
315
+ - `adafactor`: False
316
+ - `group_by_length`: False
317
+ - `length_column_name`: length
318
+ - `ddp_find_unused_parameters`: None
319
+ - `ddp_bucket_cap_mb`: None
320
+ - `ddp_broadcast_buffers`: False
321
+ - `dataloader_pin_memory`: True
322
+ - `dataloader_persistent_workers`: False
323
+ - `skip_memory_metrics`: True
324
+ - `use_legacy_prediction_loop`: False
325
+ - `push_to_hub`: False
326
+ - `resume_from_checkpoint`: None
327
+ - `hub_model_id`: None
328
+ - `hub_strategy`: every_save
329
+ - `hub_private_repo`: None
330
+ - `hub_always_push`: False
331
+ - `gradient_checkpointing`: False
332
+ - `gradient_checkpointing_kwargs`: None
333
+ - `include_inputs_for_metrics`: False
334
+ - `include_for_metrics`: []
335
+ - `eval_do_concat_batches`: True
336
+ - `fp16_backend`: auto
337
+ - `push_to_hub_model_id`: None
338
+ - `push_to_hub_organization`: None
339
+ - `mp_parameters`:
340
+ - `auto_find_batch_size`: False
341
+ - `full_determinism`: False
342
+ - `torchdynamo`: None
343
+ - `ray_scope`: last
344
+ - `ddp_timeout`: 1800
345
+ - `torch_compile`: False
346
+ - `torch_compile_backend`: None
347
+ - `torch_compile_mode`: None
348
+ - `dispatch_batches`: None
349
+ - `split_batches`: None
350
+ - `include_tokens_per_second`: False
351
+ - `include_num_input_tokens_seen`: False
352
+ - `neftune_noise_alpha`: None
353
+ - `optim_target_modules`: None
354
+ - `batch_eval_metrics`: False
355
+ - `eval_on_start`: False
356
+ - `use_liger_kernel`: False
357
+ - `eval_use_gather_object`: False
358
+ - `average_tokens_across_devices`: False
359
+ - `prompts`: None
360
+ - `batch_sampler`: batch_sampler
361
+ - `multi_dataset_batch_sampler`: proportional
362
+
363
+ </details>
364
+
365
+ ### Training Logs
366
+ | Epoch | Step | Training Loss | Validation Loss | all-nli-dev_cosine_accuracy |
367
+ |:------:|:-----:|:-------------:|:---------------:|:---------------------------:|
368
+ | 0.2655 | 500 | 3.0729 | 0.4237 | 0.9229 |
369
+ | 0.5310 | 1000 | 2.2154 | 0.3830 | 0.9257 |
370
+ | 0.7965 | 1500 | 1.9267 | 0.3517 | 0.9319 |
371
+ | 1.0616 | 2000 | 1.7078 | 0.3424 | 0.9354 |
372
+ | 1.3271 | 2500 | 1.4602 | 0.3362 | 0.9368 |
373
+ | 1.5926 | 3000 | 1.3925 | 0.3290 | 0.9379 |
374
+ | 1.8581 | 3500 | 1.3124 | 0.3116 | 0.9417 |
375
+ | 2.1232 | 4000 | 1.1537 | 0.3154 | 0.9382 |
376
+ | 2.3887 | 4500 | 1.0209 | 0.3205 | 0.9412 |
377
+ | 2.6542 | 5000 | 0.9897 | 0.3065 | 0.9441 |
378
+ | 2.9197 | 5500 | 0.9611 | 0.3025 | 0.9420 |
379
+ | 3.1848 | 6000 | 0.8276 | 0.3162 | 0.9446 |
380
+ | 3.4503 | 6500 | 0.7779 | 0.3101 | 0.9408 |
381
+ | 3.7158 | 7000 | 0.7738 | 0.3110 | 0.9426 |
382
+ | 3.9813 | 7500 | 0.7641 | 0.3056 | 0.9434 |
383
+ | 4.2464 | 8000 | 0.6338 | 0.3152 | 0.9429 |
384
+ | 4.5119 | 8500 | 0.6397 | 0.3133 | 0.9421 |
385
+ | 4.7774 | 9000 | 0.6207 | 0.3160 | 0.9420 |
386
+ | 5.0425 | 9500 | 0.6044 | 0.3156 | 0.9408 |
387
+ | 5.3080 | 10000 | 0.5305 | 0.3205 | 0.9449 |
388
+ | 5.5735 | 10500 | 0.5377 | 0.3124 | 0.9450 |
389
+ | 5.8390 | 11000 | 0.5311 | 0.3168 | 0.9443 |
390
+ | 6.1041 | 11500 | 0.5017 | 0.3250 | 0.9435 |
391
+ | 6.3696 | 12000 | 0.46 | 0.3213 | 0.9429 |
392
+ | 6.6351 | 12500 | 0.4679 | 0.3212 | 0.9443 |
393
+ | 6.9006 | 13000 | 0.4692 | 0.3221 | 0.9434 |
394
+ | 7.1657 | 13500 | 0.4285 | 0.3231 | 0.9446 |
395
+ | 7.4312 | 14000 | 0.4161 | 0.3265 | 0.9456 |
396
+ | 7.6967 | 14500 | 0.409 | 0.3240 | 0.9456 |
397
+ | 7.9622 | 15000 | 0.4127 | 0.3250 | 0.9444 |
398
+ | 8.2273 | 15500 | 0.3843 | 0.3290 | 0.9447 |
399
+ | 8.4928 | 16000 | 0.3755 | 0.3259 | 0.9438 |
400
+ | 8.7583 | 16500 | 0.3786 | 0.3328 | 0.9438 |
401
+ | 9.0234 | 17000 | 0.3702 | 0.3284 | 0.9453 |
402
+ | 9.2889 | 17500 | 0.3525 | 0.3326 | 0.9444 |
403
+ | 9.5544 | 18000 | 0.3589 | 0.3320 | 0.9443 |
404
+ | 9.8199 | 18500 | 0.3483 | 0.3314 | 0.9438 |
405
+
406
+
407
+ ### Framework Versions
408
+ - Python: 3.11.11
409
+ - Sentence Transformers: 3.4.1
410
+ - Transformers: 4.48.3
411
+ - PyTorch: 2.5.1+cu124
412
+ - Accelerate: 1.3.0
413
+ - Datasets: 3.3.0
414
+ - Tokenizers: 0.21.0
415
+
416
+ ## Citation
417
+
418
+ ### BibTeX
419
+
420
+ #### Sentence Transformers
421
+ ```bibtex
422
+ @inproceedings{reimers-2019-sentence-bert,
423
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
424
+ author = "Reimers, Nils and Gurevych, Iryna",
425
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
426
+ month = "11",
427
+ year = "2019",
428
+ publisher = "Association for Computational Linguistics",
429
+ url = "https://arxiv.org/abs/1908.10084",
430
+ }
431
+ ```
432
+
433
+ #### MultipleNegativesRankingLoss
434
+ ```bibtex
435
+ @misc{henderson2017efficient,
436
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
437
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
438
+ year={2017},
439
+ eprint={1705.00652},
440
+ archivePrefix={arXiv},
441
+ primaryClass={cs.CL}
442
+ }
443
+ ```
444
+
445
+ <!--
446
+ ## Glossary
447
+
448
+ *Clearly define terms in order to be accessible across audiences.*
449
+ -->
450
+
451
+ <!--
452
+ ## Model Card Authors
453
+
454
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
455
+ -->
456
+
457
+ <!--
458
+ ## Model Card Contact
459
+
460
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
461
+ -->
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Alibaba-NLP/gte-multilingual-base",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
9
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
10
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
11
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
12
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
13
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
14
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
15
+ },
16
+ "classifier_dropout": 0.0,
17
+ "hidden_act": "gelu",
18
+ "hidden_dropout_prob": 0.1,
19
+ "hidden_size": 768,
20
+ "id2label": {
21
+ "0": "LABEL_0"
22
+ },
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 3072,
25
+ "label2id": {
26
+ "LABEL_0": 0
27
+ },
28
+ "layer_norm_eps": 1e-12,
29
+ "layer_norm_type": "layer_norm",
30
+ "logn_attention_clip1": false,
31
+ "logn_attention_scale": false,
32
+ "max_position_embeddings": 8192,
33
+ "model_type": "new",
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 12,
36
+ "pack_qkv": true,
37
+ "pad_token_id": 1,
38
+ "position_embedding_type": "rope",
39
+ "rope_scaling": {
40
+ "factor": 8.0,
41
+ "type": "ntk"
42
+ },
43
+ "rope_theta": 20000,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.48.3",
46
+ "type_vocab_size": 1,
47
+ "unpad_inputs": false,
48
+ "use_memory_efficient_attention": false,
49
+ "vocab_size": 250048
50
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9fe993d185e3007299c7b0d33b2e49208c350335228cbaf86cdc3c01921eb5d
3
+ size 1221487872
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 8192,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
55
+ }