dipteshkanojia commited on
Commit
ea94001
·
verified ·
1 Parent(s): f8c684e

Add new SentenceTransformer model.

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,995 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: FacebookAI/xlm-roberta-large
3
+ datasets:
4
+ - sentence-transformers/stsb
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ metrics:
9
+ - pearson_cosine
10
+ - spearman_cosine
11
+ - pearson_manhattan
12
+ - spearman_manhattan
13
+ - pearson_euclidean
14
+ - spearman_euclidean
15
+ - pearson_dot
16
+ - spearman_dot
17
+ - pearson_max
18
+ - spearman_max
19
+ pipeline_tag: sentence-similarity
20
+ tags:
21
+ - sentence-transformers
22
+ - sentence-similarity
23
+ - feature-extraction
24
+ - generated_from_trainer
25
+ - dataset_size:5749
26
+ - loss:MatryoshkaLoss
27
+ - loss:CoSENTLoss
28
+ widget:
29
+ - source_sentence: A chef is preparing some food.
30
+ sentences:
31
+ - Five birds stand on the snow.
32
+ - A chef prepared a meal.
33
+ - There is no 'still' that is not relative to some other object.
34
+ - source_sentence: A woman is adding oil on fishes.
35
+ sentences:
36
+ - Large cruise ship floating on the water.
37
+ - It refers to the maximum f-stop (which is defined as the ratio of focal length
38
+ to effective aperture diameter).
39
+ - The woman is cutting potatoes.
40
+ - source_sentence: The player shoots the winning points.
41
+ sentences:
42
+ - Minimum wage laws hurt the least skilled, least productive the most.
43
+ - The basketball player is about to score points for his team.
44
+ - Three televisions, on on the floor, the other two on a box.
45
+ - source_sentence: Stars form in star-formation regions, which itself develop from
46
+ molecular clouds.
47
+ sentences:
48
+ - Although I believe Searle is mistaken, I don't think you have found the problem.
49
+ - It may be possible for a solar system like ours to exist outside of a galaxy.
50
+ - A blond-haired child performing on the trumpet in front of a house while his younger
51
+ brother watches.
52
+ - source_sentence: While Queen may refer to both Queen regent (sovereign) or Queen
53
+ consort, the King has always been the sovereign.
54
+ sentences:
55
+ - At first, I thought this is a bit of a tricky question.
56
+ - A man plays the guitar.
57
+ - There is a very good reason not to refer to the Queen's spouse as "King" - because
58
+ they aren't the King.
59
+ model-index:
60
+ - name: SentenceTransformer based on FacebookAI/xlm-roberta-large
61
+ results:
62
+ - task:
63
+ type: semantic-similarity
64
+ name: Semantic Similarity
65
+ dataset:
66
+ name: sts dev 768
67
+ type: sts-dev-768
68
+ metrics:
69
+ - type: pearson_cosine
70
+ value: .nan
71
+ name: Pearson Cosine
72
+ - type: spearman_cosine
73
+ value: .nan
74
+ name: Spearman Cosine
75
+ - type: pearson_manhattan
76
+ value: -0.038123417655342585
77
+ name: Pearson Manhattan
78
+ - type: spearman_manhattan
79
+ value: -0.030855987437062582
80
+ name: Spearman Manhattan
81
+ - type: pearson_euclidean
82
+ value: -0.0742298464837288
83
+ name: Pearson Euclidean
84
+ - type: spearman_euclidean
85
+ value: -0.016119009479880368
86
+ name: Spearman Euclidean
87
+ - type: pearson_dot
88
+ value: -0.053239384921975864
89
+ name: Pearson Dot
90
+ - type: spearman_dot
91
+ value: -0.03860610142560432
92
+ name: Spearman Dot
93
+ - type: pearson_max
94
+ value: .nan
95
+ name: Pearson Max
96
+ - type: spearman_max
97
+ value: .nan
98
+ name: Spearman Max
99
+ - task:
100
+ type: semantic-similarity
101
+ name: Semantic Similarity
102
+ dataset:
103
+ name: sts dev 512
104
+ type: sts-dev-512
105
+ metrics:
106
+ - type: pearson_cosine
107
+ value: .nan
108
+ name: Pearson Cosine
109
+ - type: spearman_cosine
110
+ value: .nan
111
+ name: Spearman Cosine
112
+ - type: pearson_manhattan
113
+ value: -0.040766255073950965
114
+ name: Pearson Manhattan
115
+ - type: spearman_manhattan
116
+ value: -0.028106086435826655
117
+ name: Spearman Manhattan
118
+ - type: pearson_euclidean
119
+ value: -0.076050553000047
120
+ name: Pearson Euclidean
121
+ - type: spearman_euclidean
122
+ value: -0.014573222092867504
123
+ name: Spearman Euclidean
124
+ - type: pearson_dot
125
+ value: -0.06110575151055097
126
+ name: Pearson Dot
127
+ - type: spearman_dot
128
+ value: -0.04818501881621991
129
+ name: Spearman Dot
130
+ - type: pearson_max
131
+ value: .nan
132
+ name: Pearson Max
133
+ - type: spearman_max
134
+ value: .nan
135
+ name: Spearman Max
136
+ - task:
137
+ type: semantic-similarity
138
+ name: Semantic Similarity
139
+ dataset:
140
+ name: sts dev 256
141
+ type: sts-dev-256
142
+ metrics:
143
+ - type: pearson_cosine
144
+ value: .nan
145
+ name: Pearson Cosine
146
+ - type: spearman_cosine
147
+ value: .nan
148
+ name: Spearman Cosine
149
+ - type: pearson_manhattan
150
+ value: -0.044210895435818166
151
+ name: Pearson Manhattan
152
+ - type: spearman_manhattan
153
+ value: -0.03253407490039325
154
+ name: Spearman Manhattan
155
+ - type: pearson_euclidean
156
+ value: -0.0529355152933442
157
+ name: Pearson Euclidean
158
+ - type: spearman_euclidean
159
+ value: -0.0338167301189937
160
+ name: Spearman Euclidean
161
+ - type: pearson_dot
162
+ value: 0.0887169006335579
163
+ name: Pearson Dot
164
+ - type: spearman_dot
165
+ value: 0.06886250477710897
166
+ name: Spearman Dot
167
+ - type: pearson_max
168
+ value: .nan
169
+ name: Pearson Max
170
+ - type: spearman_max
171
+ value: .nan
172
+ name: Spearman Max
173
+ - task:
174
+ type: semantic-similarity
175
+ name: Semantic Similarity
176
+ dataset:
177
+ name: sts dev 128
178
+ type: sts-dev-128
179
+ metrics:
180
+ - type: pearson_cosine
181
+ value: .nan
182
+ name: Pearson Cosine
183
+ - type: spearman_cosine
184
+ value: .nan
185
+ name: Spearman Cosine
186
+ - type: pearson_manhattan
187
+ value: -0.05321620243744594
188
+ name: Pearson Manhattan
189
+ - type: spearman_manhattan
190
+ value: -0.026531903856252148
191
+ name: Spearman Manhattan
192
+ - type: pearson_euclidean
193
+ value: -0.06064347235216407
194
+ name: Pearson Euclidean
195
+ - type: spearman_euclidean
196
+ value: -0.0270947004666721
197
+ name: Spearman Euclidean
198
+ - type: pearson_dot
199
+ value: 0.07199088437564892
200
+ name: Pearson Dot
201
+ - type: spearman_dot
202
+ value: 0.05552894816506978
203
+ name: Spearman Dot
204
+ - type: pearson_max
205
+ value: .nan
206
+ name: Pearson Max
207
+ - type: spearman_max
208
+ value: .nan
209
+ name: Spearman Max
210
+ - task:
211
+ type: semantic-similarity
212
+ name: Semantic Similarity
213
+ dataset:
214
+ name: sts dev 64
215
+ type: sts-dev-64
216
+ metrics:
217
+ - type: pearson_cosine
218
+ value: .nan
219
+ name: Pearson Cosine
220
+ - type: spearman_cosine
221
+ value: .nan
222
+ name: Spearman Cosine
223
+ - type: pearson_manhattan
224
+ value: -0.046922199302745354
225
+ name: Pearson Manhattan
226
+ - type: spearman_manhattan
227
+ value: -0.027530540631984835
228
+ name: Spearman Manhattan
229
+ - type: pearson_euclidean
230
+ value: -0.04930495975336398
231
+ name: Pearson Euclidean
232
+ - type: spearman_euclidean
233
+ value: -0.02287953412697089
234
+ name: Spearman Euclidean
235
+ - type: pearson_dot
236
+ value: 0.05851507366090909
237
+ name: Pearson Dot
238
+ - type: spearman_dot
239
+ value: 0.044913605667507114
240
+ name: Spearman Dot
241
+ - type: pearson_max
242
+ value: .nan
243
+ name: Pearson Max
244
+ - type: spearman_max
245
+ value: .nan
246
+ name: Spearman Max
247
+ - task:
248
+ type: semantic-similarity
249
+ name: Semantic Similarity
250
+ dataset:
251
+ name: sts test 768
252
+ type: sts-test-768
253
+ metrics:
254
+ - type: pearson_cosine
255
+ value: .nan
256
+ name: Pearson Cosine
257
+ - type: spearman_cosine
258
+ value: .nan
259
+ name: Spearman Cosine
260
+ - type: pearson_manhattan
261
+ value: 0.0005203243269627229
262
+ name: Pearson Manhattan
263
+ - type: spearman_manhattan
264
+ value: 0.007914891421418472
265
+ name: Spearman Manhattan
266
+ - type: pearson_euclidean
267
+ value: -0.008479099839233263
268
+ name: Pearson Euclidean
269
+ - type: spearman_euclidean
270
+ value: 0.0002449834909380018
271
+ name: Spearman Euclidean
272
+ - type: pearson_dot
273
+ value: 0.015253799995136243
274
+ name: Pearson Dot
275
+ - type: spearman_dot
276
+ value: -0.002544651953260673
277
+ name: Spearman Dot
278
+ - type: pearson_max
279
+ value: .nan
280
+ name: Pearson Max
281
+ - type: spearman_max
282
+ value: .nan
283
+ name: Spearman Max
284
+ - task:
285
+ type: semantic-similarity
286
+ name: Semantic Similarity
287
+ dataset:
288
+ name: sts test 512
289
+ type: sts-test-512
290
+ metrics:
291
+ - type: pearson_cosine
292
+ value: .nan
293
+ name: Pearson Cosine
294
+ - type: spearman_cosine
295
+ value: .nan
296
+ name: Spearman Cosine
297
+ - type: pearson_manhattan
298
+ value: -0.000985791968546407
299
+ name: Pearson Manhattan
300
+ - type: spearman_manhattan
301
+ value: 0.009210170664121263
302
+ name: Spearman Manhattan
303
+ - type: pearson_euclidean
304
+ value: -0.010968197464829785
305
+ name: Pearson Euclidean
306
+ - type: spearman_euclidean
307
+ value: 0.0006366521814203481
308
+ name: Spearman Euclidean
309
+ - type: pearson_dot
310
+ value: 0.030903954394043587
311
+ name: Pearson Dot
312
+ - type: spearman_dot
313
+ value: 0.0214169911509498
314
+ name: Spearman Dot
315
+ - type: pearson_max
316
+ value: .nan
317
+ name: Pearson Max
318
+ - type: spearman_max
319
+ value: .nan
320
+ name: Spearman Max
321
+ - task:
322
+ type: semantic-similarity
323
+ name: Semantic Similarity
324
+ dataset:
325
+ name: sts test 256
326
+ type: sts-test-256
327
+ metrics:
328
+ - type: pearson_cosine
329
+ value: .nan
330
+ name: Pearson Cosine
331
+ - type: spearman_cosine
332
+ value: .nan
333
+ name: Spearman Cosine
334
+ - type: pearson_manhattan
335
+ value: -0.008347426706014351
336
+ name: Pearson Manhattan
337
+ - type: spearman_manhattan
338
+ value: 0.008133437696668973
339
+ name: Spearman Manhattan
340
+ - type: pearson_euclidean
341
+ value: -0.01284332508912676
342
+ name: Pearson Euclidean
343
+ - type: spearman_euclidean
344
+ value: 0.006207692348050752
345
+ name: Spearman Euclidean
346
+ - type: pearson_dot
347
+ value: -0.10411841010392278
348
+ name: Pearson Dot
349
+ - type: spearman_dot
350
+ value: -0.10441611480429308
351
+ name: Spearman Dot
352
+ - type: pearson_max
353
+ value: .nan
354
+ name: Pearson Max
355
+ - type: spearman_max
356
+ value: .nan
357
+ name: Spearman Max
358
+ - task:
359
+ type: semantic-similarity
360
+ name: Semantic Similarity
361
+ dataset:
362
+ name: sts test 128
363
+ type: sts-test-128
364
+ metrics:
365
+ - type: pearson_cosine
366
+ value: .nan
367
+ name: Pearson Cosine
368
+ - type: spearman_cosine
369
+ value: .nan
370
+ name: Spearman Cosine
371
+ - type: pearson_manhattan
372
+ value: -0.007293947286825709
373
+ name: Pearson Manhattan
374
+ - type: spearman_manhattan
375
+ value: 0.012461130559236479
376
+ name: Spearman Manhattan
377
+ - type: pearson_euclidean
378
+ value: -0.013785631605643068
379
+ name: Pearson Euclidean
380
+ - type: spearman_euclidean
381
+ value: 0.008355374230034162
382
+ name: Spearman Euclidean
383
+ - type: pearson_dot
384
+ value: -0.07790382803601184
385
+ name: Pearson Dot
386
+ - type: spearman_dot
387
+ value: -0.08277939304968172
388
+ name: Spearman Dot
389
+ - type: pearson_max
390
+ value: .nan
391
+ name: Pearson Max
392
+ - type: spearman_max
393
+ value: .nan
394
+ name: Spearman Max
395
+ - task:
396
+ type: semantic-similarity
397
+ name: Semantic Similarity
398
+ dataset:
399
+ name: sts test 64
400
+ type: sts-test-64
401
+ metrics:
402
+ - type: pearson_cosine
403
+ value: .nan
404
+ name: Pearson Cosine
405
+ - type: spearman_cosine
406
+ value: .nan
407
+ name: Spearman Cosine
408
+ - type: pearson_manhattan
409
+ value: -0.012731573411777072
410
+ name: Pearson Manhattan
411
+ - type: spearman_manhattan
412
+ value: 0.003453137865023755
413
+ name: Spearman Manhattan
414
+ - type: pearson_euclidean
415
+ value: -0.013710254571378023
416
+ name: Pearson Euclidean
417
+ - type: spearman_euclidean
418
+ value: 0.0028389826642085166
419
+ name: Spearman Euclidean
420
+ - type: pearson_dot
421
+ value: -0.04900795414419644
422
+ name: Pearson Dot
423
+ - type: spearman_dot
424
+ value: -0.05520642056907742
425
+ name: Spearman Dot
426
+ - type: pearson_max
427
+ value: .nan
428
+ name: Pearson Max
429
+ - type: spearman_max
430
+ value: .nan
431
+ name: Spearman Max
432
+ ---
433
+
434
+ # SentenceTransformer based on FacebookAI/xlm-roberta-large
435
+
436
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) on the [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
437
+
438
+ ## Model Details
439
+
440
+ ### Model Description
441
+ - **Model Type:** Sentence Transformer
442
+ - **Base model:** [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) <!-- at revision c23d21b0620b635a76227c604d44e43a9f0ee389 -->
443
+ - **Maximum Sequence Length:** 512 tokens
444
+ - **Output Dimensionality:** 1024 tokens
445
+ - **Similarity Function:** Cosine Similarity
446
+ - **Training Dataset:**
447
+ - [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
448
+ - **Language:** en
449
+ <!-- - **License:** Unknown -->
450
+
451
+ ### Model Sources
452
+
453
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
454
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
455
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
456
+
457
+ ### Full Model Architecture
458
+
459
+ ```
460
+ SentenceTransformer(
461
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
462
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
463
+ )
464
+ ```
465
+
466
+ ## Usage
467
+
468
+ ### Direct Usage (Sentence Transformers)
469
+
470
+ First install the Sentence Transformers library:
471
+
472
+ ```bash
473
+ pip install -U sentence-transformers
474
+ ```
475
+
476
+ Then you can load this model and run inference.
477
+ ```python
478
+ from sentence_transformers import SentenceTransformer
479
+
480
+ # Download from the 🤗 Hub
481
+ model = SentenceTransformer("dipteshkanojia/xlm-roberta-large-sts-matryoshka")
482
+ # Run inference
483
+ sentences = [
484
+ 'While Queen may refer to both Queen regent (sovereign) or Queen consort, the King has always been the sovereign.',
485
+ 'There is a very good reason not to refer to the Queen\'s spouse as "King" - because they aren\'t the King.',
486
+ 'A man plays the guitar.',
487
+ ]
488
+ embeddings = model.encode(sentences)
489
+ print(embeddings.shape)
490
+ # [3, 1024]
491
+
492
+ # Get the similarity scores for the embeddings
493
+ similarities = model.similarity(embeddings, embeddings)
494
+ print(similarities.shape)
495
+ # [3, 3]
496
+ ```
497
+
498
+ <!--
499
+ ### Direct Usage (Transformers)
500
+
501
+ <details><summary>Click to see the direct usage in Transformers</summary>
502
+
503
+ </details>
504
+ -->
505
+
506
+ <!--
507
+ ### Downstream Usage (Sentence Transformers)
508
+
509
+ You can finetune this model on your own dataset.
510
+
511
+ <details><summary>Click to expand</summary>
512
+
513
+ </details>
514
+ -->
515
+
516
+ <!--
517
+ ### Out-of-Scope Use
518
+
519
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
520
+ -->
521
+
522
+ ## Evaluation
523
+
524
+ ### Metrics
525
+
526
+ #### Semantic Similarity
527
+ * Dataset: `sts-dev-768`
528
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
529
+
530
+ | Metric | Value |
531
+ |:--------------------|:--------|
532
+ | pearson_cosine | nan |
533
+ | **spearman_cosine** | **nan** |
534
+ | pearson_manhattan | -0.0381 |
535
+ | spearman_manhattan | -0.0309 |
536
+ | pearson_euclidean | -0.0742 |
537
+ | spearman_euclidean | -0.0161 |
538
+ | pearson_dot | -0.0532 |
539
+ | spearman_dot | -0.0386 |
540
+ | pearson_max | nan |
541
+ | spearman_max | nan |
542
+
543
+ #### Semantic Similarity
544
+ * Dataset: `sts-dev-512`
545
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
546
+
547
+ | Metric | Value |
548
+ |:--------------------|:--------|
549
+ | pearson_cosine | nan |
550
+ | **spearman_cosine** | **nan** |
551
+ | pearson_manhattan | -0.0408 |
552
+ | spearman_manhattan | -0.0281 |
553
+ | pearson_euclidean | -0.0761 |
554
+ | spearman_euclidean | -0.0146 |
555
+ | pearson_dot | -0.0611 |
556
+ | spearman_dot | -0.0482 |
557
+ | pearson_max | nan |
558
+ | spearman_max | nan |
559
+
560
+ #### Semantic Similarity
561
+ * Dataset: `sts-dev-256`
562
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
563
+
564
+ | Metric | Value |
565
+ |:--------------------|:--------|
566
+ | pearson_cosine | nan |
567
+ | **spearman_cosine** | **nan** |
568
+ | pearson_manhattan | -0.0442 |
569
+ | spearman_manhattan | -0.0325 |
570
+ | pearson_euclidean | -0.0529 |
571
+ | spearman_euclidean | -0.0338 |
572
+ | pearson_dot | 0.0887 |
573
+ | spearman_dot | 0.0689 |
574
+ | pearson_max | nan |
575
+ | spearman_max | nan |
576
+
577
+ #### Semantic Similarity
578
+ * Dataset: `sts-dev-128`
579
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
580
+
581
+ | Metric | Value |
582
+ |:--------------------|:--------|
583
+ | pearson_cosine | nan |
584
+ | **spearman_cosine** | **nan** |
585
+ | pearson_manhattan | -0.0532 |
586
+ | spearman_manhattan | -0.0265 |
587
+ | pearson_euclidean | -0.0606 |
588
+ | spearman_euclidean | -0.0271 |
589
+ | pearson_dot | 0.072 |
590
+ | spearman_dot | 0.0555 |
591
+ | pearson_max | nan |
592
+ | spearman_max | nan |
593
+
594
+ #### Semantic Similarity
595
+ * Dataset: `sts-dev-64`
596
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
597
+
598
+ | Metric | Value |
599
+ |:--------------------|:--------|
600
+ | pearson_cosine | nan |
601
+ | **spearman_cosine** | **nan** |
602
+ | pearson_manhattan | -0.0469 |
603
+ | spearman_manhattan | -0.0275 |
604
+ | pearson_euclidean | -0.0493 |
605
+ | spearman_euclidean | -0.0229 |
606
+ | pearson_dot | 0.0585 |
607
+ | spearman_dot | 0.0449 |
608
+ | pearson_max | nan |
609
+ | spearman_max | nan |
610
+
611
+ #### Semantic Similarity
612
+ * Dataset: `sts-test-768`
613
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
614
+
615
+ | Metric | Value |
616
+ |:--------------------|:--------|
617
+ | pearson_cosine | nan |
618
+ | **spearman_cosine** | **nan** |
619
+ | pearson_manhattan | 0.0005 |
620
+ | spearman_manhattan | 0.0079 |
621
+ | pearson_euclidean | -0.0085 |
622
+ | spearman_euclidean | 0.0002 |
623
+ | pearson_dot | 0.0153 |
624
+ | spearman_dot | -0.0025 |
625
+ | pearson_max | nan |
626
+ | spearman_max | nan |
627
+
628
+ #### Semantic Similarity
629
+ * Dataset: `sts-test-512`
630
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
631
+
632
+ | Metric | Value |
633
+ |:--------------------|:--------|
634
+ | pearson_cosine | nan |
635
+ | **spearman_cosine** | **nan** |
636
+ | pearson_manhattan | -0.001 |
637
+ | spearman_manhattan | 0.0092 |
638
+ | pearson_euclidean | -0.011 |
639
+ | spearman_euclidean | 0.0006 |
640
+ | pearson_dot | 0.0309 |
641
+ | spearman_dot | 0.0214 |
642
+ | pearson_max | nan |
643
+ | spearman_max | nan |
644
+
645
+ #### Semantic Similarity
646
+ * Dataset: `sts-test-256`
647
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
648
+
649
+ | Metric | Value |
650
+ |:--------------------|:--------|
651
+ | pearson_cosine | nan |
652
+ | **spearman_cosine** | **nan** |
653
+ | pearson_manhattan | -0.0083 |
654
+ | spearman_manhattan | 0.0081 |
655
+ | pearson_euclidean | -0.0128 |
656
+ | spearman_euclidean | 0.0062 |
657
+ | pearson_dot | -0.1041 |
658
+ | spearman_dot | -0.1044 |
659
+ | pearson_max | nan |
660
+ | spearman_max | nan |
661
+
662
+ #### Semantic Similarity
663
+ * Dataset: `sts-test-128`
664
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
665
+
666
+ | Metric | Value |
667
+ |:--------------------|:--------|
668
+ | pearson_cosine | nan |
669
+ | **spearman_cosine** | **nan** |
670
+ | pearson_manhattan | -0.0073 |
671
+ | spearman_manhattan | 0.0125 |
672
+ | pearson_euclidean | -0.0138 |
673
+ | spearman_euclidean | 0.0084 |
674
+ | pearson_dot | -0.0779 |
675
+ | spearman_dot | -0.0828 |
676
+ | pearson_max | nan |
677
+ | spearman_max | nan |
678
+
679
+ #### Semantic Similarity
680
+ * Dataset: `sts-test-64`
681
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
682
+
683
+ | Metric | Value |
684
+ |:--------------------|:--------|
685
+ | pearson_cosine | nan |
686
+ | **spearman_cosine** | **nan** |
687
+ | pearson_manhattan | -0.0127 |
688
+ | spearman_manhattan | 0.0035 |
689
+ | pearson_euclidean | -0.0137 |
690
+ | spearman_euclidean | 0.0028 |
691
+ | pearson_dot | -0.049 |
692
+ | spearman_dot | -0.0552 |
693
+ | pearson_max | nan |
694
+ | spearman_max | nan |
695
+
696
+ <!--
697
+ ## Bias, Risks and Limitations
698
+
699
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
700
+ -->
701
+
702
+ <!--
703
+ ### Recommendations
704
+
705
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
706
+ -->
707
+
708
+ ## Training Details
709
+
710
+ ### Training Dataset
711
+
712
+ #### sentence-transformers/stsb
713
+
714
+ * Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
715
+ * Size: 5,749 training samples
716
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
717
+ * Approximate statistics based on the first 1000 samples:
718
+ | | sentence1 | sentence2 | score |
719
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
720
+ | type | string | string | float |
721
+ | details | <ul><li>min: 6 tokens</li><li>mean: 11.08 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 11.05 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.54</li><li>max: 1.0</li></ul> |
722
+ * Samples:
723
+ | sentence1 | sentence2 | score |
724
+ |:-----------------------------------------------------------|:----------------------------------------------------------------------|:------------------|
725
+ | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
726
+ | <code>A man is playing a large flute.</code> | <code>A man is playing a flute.</code> | <code>0.76</code> |
727
+ | <code>A man is spreading shreded cheese on a pizza.</code> | <code>A man is spreading shredded cheese on an uncooked pizza.</code> | <code>0.76</code> |
728
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
729
+ ```json
730
+ {
731
+ "loss": "CoSENTLoss",
732
+ "matryoshka_dims": [
733
+ 768,
734
+ 512,
735
+ 256,
736
+ 128,
737
+ 64
738
+ ],
739
+ "matryoshka_weights": [
740
+ 1,
741
+ 1,
742
+ 1,
743
+ 1,
744
+ 1
745
+ ],
746
+ "n_dims_per_step": -1
747
+ }
748
+ ```
749
+
750
+ ### Evaluation Dataset
751
+
752
+ #### sentence-transformers/stsb
753
+
754
+ * Dataset: [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
755
+ * Size: 1,500 evaluation samples
756
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
757
+ * Approximate statistics based on the first 1000 samples:
758
+ | | sentence1 | sentence2 | score |
759
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
760
+ | type | string | string | float |
761
+ | details | <ul><li>min: 5 tokens</li><li>mean: 16.55 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 16.5 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
762
+ * Samples:
763
+ | sentence1 | sentence2 | score |
764
+ |:--------------------------------------------------|:------------------------------------------------------|:------------------|
765
+ | <code>A man with a hard hat is dancing.</code> | <code>A man wearing a hard hat is dancing.</code> | <code>1.0</code> |
766
+ | <code>A young child is riding a horse.</code> | <code>A child is riding a horse.</code> | <code>0.95</code> |
767
+ | <code>A man is feeding a mouse to a snake.</code> | <code>The man is feeding a mouse to the snake.</code> | <code>1.0</code> |
768
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
769
+ ```json
770
+ {
771
+ "loss": "CoSENTLoss",
772
+ "matryoshka_dims": [
773
+ 768,
774
+ 512,
775
+ 256,
776
+ 128,
777
+ 64
778
+ ],
779
+ "matryoshka_weights": [
780
+ 1,
781
+ 1,
782
+ 1,
783
+ 1,
784
+ 1
785
+ ],
786
+ "n_dims_per_step": -1
787
+ }
788
+ ```
789
+
790
+ ### Training Hyperparameters
791
+ #### Non-Default Hyperparameters
792
+
793
+ - `eval_strategy`: steps
794
+ - `per_device_train_batch_size`: 6
795
+ - `per_device_eval_batch_size`: 6
796
+ - `num_train_epochs`: 8
797
+ - `warmup_ratio`: 0.1
798
+ - `fp16`: True
799
+
800
+ #### All Hyperparameters
801
+ <details><summary>Click to expand</summary>
802
+
803
+ - `overwrite_output_dir`: False
804
+ - `do_predict`: False
805
+ - `eval_strategy`: steps
806
+ - `prediction_loss_only`: True
807
+ - `per_device_train_batch_size`: 6
808
+ - `per_device_eval_batch_size`: 6
809
+ - `per_gpu_train_batch_size`: None
810
+ - `per_gpu_eval_batch_size`: None
811
+ - `gradient_accumulation_steps`: 1
812
+ - `eval_accumulation_steps`: None
813
+ - `torch_empty_cache_steps`: None
814
+ - `learning_rate`: 5e-05
815
+ - `weight_decay`: 0.0
816
+ - `adam_beta1`: 0.9
817
+ - `adam_beta2`: 0.999
818
+ - `adam_epsilon`: 1e-08
819
+ - `max_grad_norm`: 1.0
820
+ - `num_train_epochs`: 8
821
+ - `max_steps`: -1
822
+ - `lr_scheduler_type`: linear
823
+ - `lr_scheduler_kwargs`: {}
824
+ - `warmup_ratio`: 0.1
825
+ - `warmup_steps`: 0
826
+ - `log_level`: passive
827
+ - `log_level_replica`: warning
828
+ - `log_on_each_node`: True
829
+ - `logging_nan_inf_filter`: True
830
+ - `save_safetensors`: True
831
+ - `save_on_each_node`: False
832
+ - `save_only_model`: False
833
+ - `restore_callback_states_from_checkpoint`: False
834
+ - `no_cuda`: False
835
+ - `use_cpu`: False
836
+ - `use_mps_device`: False
837
+ - `seed`: 42
838
+ - `data_seed`: None
839
+ - `jit_mode_eval`: False
840
+ - `use_ipex`: False
841
+ - `bf16`: False
842
+ - `fp16`: True
843
+ - `fp16_opt_level`: O1
844
+ - `half_precision_backend`: auto
845
+ - `bf16_full_eval`: False
846
+ - `fp16_full_eval`: False
847
+ - `tf32`: None
848
+ - `local_rank`: 0
849
+ - `ddp_backend`: None
850
+ - `tpu_num_cores`: None
851
+ - `tpu_metrics_debug`: False
852
+ - `debug`: []
853
+ - `dataloader_drop_last`: False
854
+ - `dataloader_num_workers`: 0
855
+ - `dataloader_prefetch_factor`: None
856
+ - `past_index`: -1
857
+ - `disable_tqdm`: False
858
+ - `remove_unused_columns`: True
859
+ - `label_names`: None
860
+ - `load_best_model_at_end`: False
861
+ - `ignore_data_skip`: False
862
+ - `fsdp`: []
863
+ - `fsdp_min_num_params`: 0
864
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
865
+ - `fsdp_transformer_layer_cls_to_wrap`: None
866
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
867
+ - `deepspeed`: None
868
+ - `label_smoothing_factor`: 0.0
869
+ - `optim`: adamw_torch
870
+ - `optim_args`: None
871
+ - `adafactor`: False
872
+ - `group_by_length`: False
873
+ - `length_column_name`: length
874
+ - `ddp_find_unused_parameters`: None
875
+ - `ddp_bucket_cap_mb`: None
876
+ - `ddp_broadcast_buffers`: False
877
+ - `dataloader_pin_memory`: True
878
+ - `dataloader_persistent_workers`: False
879
+ - `skip_memory_metrics`: True
880
+ - `use_legacy_prediction_loop`: False
881
+ - `push_to_hub`: False
882
+ - `resume_from_checkpoint`: None
883
+ - `hub_model_id`: None
884
+ - `hub_strategy`: every_save
885
+ - `hub_private_repo`: False
886
+ - `hub_always_push`: False
887
+ - `gradient_checkpointing`: False
888
+ - `gradient_checkpointing_kwargs`: None
889
+ - `include_inputs_for_metrics`: False
890
+ - `eval_do_concat_batches`: True
891
+ - `fp16_backend`: auto
892
+ - `push_to_hub_model_id`: None
893
+ - `push_to_hub_organization`: None
894
+ - `mp_parameters`:
895
+ - `auto_find_batch_size`: False
896
+ - `full_determinism`: False
897
+ - `torchdynamo`: None
898
+ - `ray_scope`: last
899
+ - `ddp_timeout`: 1800
900
+ - `torch_compile`: False
901
+ - `torch_compile_backend`: None
902
+ - `torch_compile_mode`: None
903
+ - `dispatch_batches`: None
904
+ - `split_batches`: None
905
+ - `include_tokens_per_second`: False
906
+ - `include_num_input_tokens_seen`: False
907
+ - `neftune_noise_alpha`: None
908
+ - `optim_target_modules`: None
909
+ - `batch_eval_metrics`: False
910
+ - `eval_on_start`: False
911
+ - `eval_use_gather_object`: False
912
+ - `batch_sampler`: batch_sampler
913
+ - `multi_dataset_batch_sampler`: proportional
914
+
915
+ </details>
916
+
917
+ ### Training Logs
918
+ | Epoch | Step | Training Loss | loss | sts-dev-128_spearman_cosine | sts-dev-256_spearman_cosine | sts-dev-512_spearman_cosine | sts-dev-64_spearman_cosine | sts-dev-768_spearman_cosine | sts-test-128_spearman_cosine | sts-test-256_spearman_cosine | sts-test-512_spearman_cosine | sts-test-64_spearman_cosine | sts-test-768_spearman_cosine |
919
+ |:------:|:----:|:-------------:|:-------:|:---------------------------:|:---------------------------:|:---------------------------:|:--------------------------:|:---------------------------:|:----------------------------:|:----------------------------:|:----------------------------:|:---------------------------:|:----------------------------:|
920
+ | 1.0417 | 500 | 21.1353 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
921
+ | 2.0833 | 1000 | 20.7941 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
922
+ | 3.125 | 1500 | 20.7823 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
923
+ | 4.1667 | 2000 | 20.781 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
924
+ | 5.2083 | 2500 | 20.7707 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
925
+ | 6.25 | 3000 | 20.7661 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
926
+ | 7.2917 | 3500 | 20.7719 | 20.8565 | nan | nan | nan | nan | nan | - | - | - | - | - |
927
+ | 8.0 | 3840 | - | - | - | - | - | - | - | nan | nan | nan | nan | nan |
928
+
929
+
930
+ ### Framework Versions
931
+ - Python: 3.9.19
932
+ - Sentence Transformers: 3.1.0.dev0
933
+ - Transformers: 4.44.2
934
+ - PyTorch: 2.4.1+cu121
935
+ - Accelerate: 0.34.2
936
+ - Datasets: 2.21.0
937
+ - Tokenizers: 0.19.1
938
+
939
+ ## Citation
940
+
941
+ ### BibTeX
942
+
943
+ #### Sentence Transformers
944
+ ```bibtex
945
+ @inproceedings{reimers-2019-sentence-bert,
946
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
947
+ author = "Reimers, Nils and Gurevych, Iryna",
948
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
949
+ month = "11",
950
+ year = "2019",
951
+ publisher = "Association for Computational Linguistics",
952
+ url = "https://arxiv.org/abs/1908.10084",
953
+ }
954
+ ```
955
+
956
+ #### MatryoshkaLoss
957
+ ```bibtex
958
+ @misc{kusupati2024matryoshka,
959
+ title={Matryoshka Representation Learning},
960
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
961
+ year={2024},
962
+ eprint={2205.13147},
963
+ archivePrefix={arXiv},
964
+ primaryClass={cs.LG}
965
+ }
966
+ ```
967
+
968
+ #### CoSENTLoss
969
+ ```bibtex
970
+ @online{kexuefm-8847,
971
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
972
+ author={Su Jianlin},
973
+ year={2022},
974
+ month={Jan},
975
+ url={https://kexue.fm/archives/8847},
976
+ }
977
+ ```
978
+
979
+ <!--
980
+ ## Glossary
981
+
982
+ *Clearly define terms in order to be accessible across audiences.*
983
+ -->
984
+
985
+ <!--
986
+ ## Model Card Authors
987
+
988
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
989
+ -->
990
+
991
+ <!--
992
+ ## Model Card Contact
993
+
994
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
995
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "xlm-roberta-large",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.44.2",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.0.dev0",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60f5c9a0442025429e327c478b39b65da049485680fb689cc497ec710002388c
3
+ size 2239607176
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 512,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "XLMRobertaTokenizer",
53
+ "unk_token": "<unk>"
54
+ }