anikulkar commited on
Commit
d9842b1
·
verified ·
1 Parent(s): 7e92f43

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,812 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ - generated_from_trainer
11
+ - dataset_size:6300
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ base_model: BAAI/bge-base-en-v1.5
15
+ datasets: []
16
+ metrics:
17
+ - cosine_accuracy@1
18
+ - cosine_accuracy@3
19
+ - cosine_accuracy@5
20
+ - cosine_accuracy@10
21
+ - cosine_precision@1
22
+ - cosine_precision@3
23
+ - cosine_precision@5
24
+ - cosine_precision@10
25
+ - cosine_recall@1
26
+ - cosine_recall@3
27
+ - cosine_recall@5
28
+ - cosine_recall@10
29
+ - cosine_ndcg@10
30
+ - cosine_mrr@10
31
+ - cosine_map@100
32
+ widget:
33
+ - source_sentence: From 2021 to 2022, the operating revenue increased by 4%, from
34
+ $4,923.9 million to $5,122.2 million.
35
+ sentences:
36
+ - How much does the AMC Stubs A-List membership cost per month depending on the
37
+ geographic market?
38
+ - What was the percentage change in operating revenue from 2021 to 2022?
39
+ - What types of coverage does political risk insurance provide for commercial lenders?
40
+ - source_sentence: Our two operating segments are "Compute & Networking" and "Graphics."
41
+ Refer to Note 17 of the Notes to the Consolidated Financial Statements in Part
42
+ IV, Item 15 of this Annual Report on Form 10-K for additional information.
43
+ sentences:
44
+ - What was the noncash impairment charge recorded in the fourth quarter of 2023
45
+ for the goodwill attributable to FedEx Dataworks?
46
+ - What are the two operating segments of NVIDIA as mentioned in the text?
47
+ - What is the disclosure threshold for environmental proceedings involving monetary
48
+ sanctions according to SEC regulations?
49
+ - source_sentence: For 2023, the weighted-average actuarial assumptions for retirement
50
+ plans included a service cost discount rate of 4.85% and a rate of increase in
51
+ compensation levels of 3.71%.
52
+ sentences:
53
+ - What are the actuarial assumptions for retirement plans discount rate and rate
54
+ of increase in compensation levels in 2023?
55
+ - Where are accrued interest and penalties related to unrecognized tax benefits
56
+ recorded?
57
+ - What is the purpose of the Employee Resource Groups (ERGs) in the organization?
58
+ - source_sentence: The Company is currently party to certain legal proceedings, none
59
+ of which we believe to be material to our business or financial condition.
60
+ sentences:
61
+ - What measures is The Hershey Company taking to ensure sufficient liquidity during
62
+ economic downturns?
63
+ - What is the impact of structural changes on the unit case volume and concentrate
64
+ sales volume of the company on a consolidated basis or at the geographic operating
65
+ segment level?
66
+ - What is the company's perspective on the impact of the legal proceedings on its
67
+ financial condition?
68
+ - source_sentence: We recognize gains and losses on pension and postretirement plan
69
+ assets and obligations immediately in Other income (expense) - net in our consolidated
70
+ statements of income.
71
+ sentences:
72
+ - Where are gains and losses on pension and postretirement plan assets and obligations
73
+ recognized in financial statements?
74
+ - What is the total amount of property, plant, and equipment, net, reported by the
75
+ company for the fiscal year 2023?
76
+ - What were the accumulated benefit obligation and fair value of plan assets for
77
+ certain U.S. pension plans with obligations exceeding assets as of December 31,
78
+ 2023?
79
+ pipeline_tag: sentence-similarity
80
+ model-index:
81
+ - name: BGE base Financial Matryoshka
82
+ results:
83
+ - task:
84
+ type: information-retrieval
85
+ name: Information Retrieval
86
+ dataset:
87
+ name: dim 768
88
+ type: dim_768
89
+ metrics:
90
+ - type: cosine_accuracy@1
91
+ value: 0.6828571428571428
92
+ name: Cosine Accuracy@1
93
+ - type: cosine_accuracy@3
94
+ value: 0.8228571428571428
95
+ name: Cosine Accuracy@3
96
+ - type: cosine_accuracy@5
97
+ value: 0.86
98
+ name: Cosine Accuracy@5
99
+ - type: cosine_accuracy@10
100
+ value: 0.9057142857142857
101
+ name: Cosine Accuracy@10
102
+ - type: cosine_precision@1
103
+ value: 0.6828571428571428
104
+ name: Cosine Precision@1
105
+ - type: cosine_precision@3
106
+ value: 0.2742857142857143
107
+ name: Cosine Precision@3
108
+ - type: cosine_precision@5
109
+ value: 0.172
110
+ name: Cosine Precision@5
111
+ - type: cosine_precision@10
112
+ value: 0.09057142857142855
113
+ name: Cosine Precision@10
114
+ - type: cosine_recall@1
115
+ value: 0.6828571428571428
116
+ name: Cosine Recall@1
117
+ - type: cosine_recall@3
118
+ value: 0.8228571428571428
119
+ name: Cosine Recall@3
120
+ - type: cosine_recall@5
121
+ value: 0.86
122
+ name: Cosine Recall@5
123
+ - type: cosine_recall@10
124
+ value: 0.9057142857142857
125
+ name: Cosine Recall@10
126
+ - type: cosine_ndcg@10
127
+ value: 0.7960843632092954
128
+ name: Cosine Ndcg@10
129
+ - type: cosine_mrr@10
130
+ value: 0.7607987528344665
131
+ name: Cosine Mrr@10
132
+ - type: cosine_map@100
133
+ value: 0.7647429753660495
134
+ name: Cosine Map@100
135
+ - task:
136
+ type: information-retrieval
137
+ name: Information Retrieval
138
+ dataset:
139
+ name: dim 512
140
+ type: dim_512
141
+ metrics:
142
+ - type: cosine_accuracy@1
143
+ value: 0.6842857142857143
144
+ name: Cosine Accuracy@1
145
+ - type: cosine_accuracy@3
146
+ value: 0.8228571428571428
147
+ name: Cosine Accuracy@3
148
+ - type: cosine_accuracy@5
149
+ value: 0.8557142857142858
150
+ name: Cosine Accuracy@5
151
+ - type: cosine_accuracy@10
152
+ value: 0.9014285714285715
153
+ name: Cosine Accuracy@10
154
+ - type: cosine_precision@1
155
+ value: 0.6842857142857143
156
+ name: Cosine Precision@1
157
+ - type: cosine_precision@3
158
+ value: 0.2742857142857143
159
+ name: Cosine Precision@3
160
+ - type: cosine_precision@5
161
+ value: 0.17114285714285712
162
+ name: Cosine Precision@5
163
+ - type: cosine_precision@10
164
+ value: 0.09014285714285714
165
+ name: Cosine Precision@10
166
+ - type: cosine_recall@1
167
+ value: 0.6842857142857143
168
+ name: Cosine Recall@1
169
+ - type: cosine_recall@3
170
+ value: 0.8228571428571428
171
+ name: Cosine Recall@3
172
+ - type: cosine_recall@5
173
+ value: 0.8557142857142858
174
+ name: Cosine Recall@5
175
+ - type: cosine_recall@10
176
+ value: 0.9014285714285715
177
+ name: Cosine Recall@10
178
+ - type: cosine_ndcg@10
179
+ value: 0.7939749538465997
180
+ name: Cosine Ndcg@10
181
+ - type: cosine_mrr@10
182
+ value: 0.7593849206349204
183
+ name: Cosine Mrr@10
184
+ - type: cosine_map@100
185
+ value: 0.7635559033333911
186
+ name: Cosine Map@100
187
+ - task:
188
+ type: information-retrieval
189
+ name: Information Retrieval
190
+ dataset:
191
+ name: dim 256
192
+ type: dim_256
193
+ metrics:
194
+ - type: cosine_accuracy@1
195
+ value: 0.68
196
+ name: Cosine Accuracy@1
197
+ - type: cosine_accuracy@3
198
+ value: 0.8114285714285714
199
+ name: Cosine Accuracy@3
200
+ - type: cosine_accuracy@5
201
+ value: 0.85
202
+ name: Cosine Accuracy@5
203
+ - type: cosine_accuracy@10
204
+ value: 0.8942857142857142
205
+ name: Cosine Accuracy@10
206
+ - type: cosine_precision@1
207
+ value: 0.68
208
+ name: Cosine Precision@1
209
+ - type: cosine_precision@3
210
+ value: 0.2704761904761905
211
+ name: Cosine Precision@3
212
+ - type: cosine_precision@5
213
+ value: 0.16999999999999998
214
+ name: Cosine Precision@5
215
+ - type: cosine_precision@10
216
+ value: 0.08942857142857143
217
+ name: Cosine Precision@10
218
+ - type: cosine_recall@1
219
+ value: 0.68
220
+ name: Cosine Recall@1
221
+ - type: cosine_recall@3
222
+ value: 0.8114285714285714
223
+ name: Cosine Recall@3
224
+ - type: cosine_recall@5
225
+ value: 0.85
226
+ name: Cosine Recall@5
227
+ - type: cosine_recall@10
228
+ value: 0.8942857142857142
229
+ name: Cosine Recall@10
230
+ - type: cosine_ndcg@10
231
+ value: 0.7888779795440546
232
+ name: Cosine Ndcg@10
233
+ - type: cosine_mrr@10
234
+ value: 0.7549767573696146
235
+ name: Cosine Mrr@10
236
+ - type: cosine_map@100
237
+ value: 0.7594249239569217
238
+ name: Cosine Map@100
239
+ - task:
240
+ type: information-retrieval
241
+ name: Information Retrieval
242
+ dataset:
243
+ name: dim 128
244
+ type: dim_128
245
+ metrics:
246
+ - type: cosine_accuracy@1
247
+ value: 0.6571428571428571
248
+ name: Cosine Accuracy@1
249
+ - type: cosine_accuracy@3
250
+ value: 0.7942857142857143
251
+ name: Cosine Accuracy@3
252
+ - type: cosine_accuracy@5
253
+ value: 0.8342857142857143
254
+ name: Cosine Accuracy@5
255
+ - type: cosine_accuracy@10
256
+ value: 0.8885714285714286
257
+ name: Cosine Accuracy@10
258
+ - type: cosine_precision@1
259
+ value: 0.6571428571428571
260
+ name: Cosine Precision@1
261
+ - type: cosine_precision@3
262
+ value: 0.26476190476190475
263
+ name: Cosine Precision@3
264
+ - type: cosine_precision@5
265
+ value: 0.16685714285714284
266
+ name: Cosine Precision@5
267
+ - type: cosine_precision@10
268
+ value: 0.08885714285714284
269
+ name: Cosine Precision@10
270
+ - type: cosine_recall@1
271
+ value: 0.6571428571428571
272
+ name: Cosine Recall@1
273
+ - type: cosine_recall@3
274
+ value: 0.7942857142857143
275
+ name: Cosine Recall@3
276
+ - type: cosine_recall@5
277
+ value: 0.8342857142857143
278
+ name: Cosine Recall@5
279
+ - type: cosine_recall@10
280
+ value: 0.8885714285714286
281
+ name: Cosine Recall@10
282
+ - type: cosine_ndcg@10
283
+ value: 0.7729724847261471
284
+ name: Cosine Ndcg@10
285
+ - type: cosine_mrr@10
286
+ value: 0.7360578231292516
287
+ name: Cosine Mrr@10
288
+ - type: cosine_map@100
289
+ value: 0.740309728715939
290
+ name: Cosine Map@100
291
+ - task:
292
+ type: information-retrieval
293
+ name: Information Retrieval
294
+ dataset:
295
+ name: dim 64
296
+ type: dim_64
297
+ metrics:
298
+ - type: cosine_accuracy@1
299
+ value: 0.6185714285714285
300
+ name: Cosine Accuracy@1
301
+ - type: cosine_accuracy@3
302
+ value: 0.76
303
+ name: Cosine Accuracy@3
304
+ - type: cosine_accuracy@5
305
+ value: 0.8
306
+ name: Cosine Accuracy@5
307
+ - type: cosine_accuracy@10
308
+ value: 0.8657142857142858
309
+ name: Cosine Accuracy@10
310
+ - type: cosine_precision@1
311
+ value: 0.6185714285714285
312
+ name: Cosine Precision@1
313
+ - type: cosine_precision@3
314
+ value: 0.2533333333333333
315
+ name: Cosine Precision@3
316
+ - type: cosine_precision@5
317
+ value: 0.15999999999999998
318
+ name: Cosine Precision@5
319
+ - type: cosine_precision@10
320
+ value: 0.08657142857142855
321
+ name: Cosine Precision@10
322
+ - type: cosine_recall@1
323
+ value: 0.6185714285714285
324
+ name: Cosine Recall@1
325
+ - type: cosine_recall@3
326
+ value: 0.76
327
+ name: Cosine Recall@3
328
+ - type: cosine_recall@5
329
+ value: 0.8
330
+ name: Cosine Recall@5
331
+ - type: cosine_recall@10
332
+ value: 0.8657142857142858
333
+ name: Cosine Recall@10
334
+ - type: cosine_ndcg@10
335
+ value: 0.7409253495656911
336
+ name: Cosine Ndcg@10
337
+ - type: cosine_mrr@10
338
+ value: 0.7012964852607709
339
+ name: Cosine Mrr@10
340
+ - type: cosine_map@100
341
+ value: 0.7061843304820828
342
+ name: Cosine Map@100
343
+ ---
344
+
345
+ # BGE base Financial Matryoshka
346
+
347
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
348
+
349
+ ## Model Details
350
+
351
+ ### Model Description
352
+ - **Model Type:** Sentence Transformer
353
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
354
+ - **Maximum Sequence Length:** 512 tokens
355
+ - **Output Dimensionality:** 768 tokens
356
+ - **Similarity Function:** Cosine Similarity
357
+ <!-- - **Training Dataset:** Unknown -->
358
+ - **Language:** en
359
+ - **License:** apache-2.0
360
+
361
+ ### Model Sources
362
+
363
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
364
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
365
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
366
+
367
+ ### Full Model Architecture
368
+
369
+ ```
370
+ SentenceTransformer(
371
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
372
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
373
+ (2): Normalize()
374
+ )
375
+ ```
376
+
377
+ ## Usage
378
+
379
+ ### Direct Usage (Sentence Transformers)
380
+
381
+ First install the Sentence Transformers library:
382
+
383
+ ```bash
384
+ pip install -U sentence-transformers
385
+ ```
386
+
387
+ Then you can load this model and run inference.
388
+ ```python
389
+ from sentence_transformers import SentenceTransformer
390
+
391
+ # Download from the 🤗 Hub
392
+ model = SentenceTransformer("anikulkar/bge-base-financial-matryoshka")
393
+ # Run inference
394
+ sentences = [
395
+ 'We recognize gains and losses on pension and postretirement plan assets and obligations immediately in Other income (expense) - net in our consolidated statements of income.',
396
+ 'Where are gains and losses on pension and postretirement plan assets and obligations recognized in financial statements?',
397
+ 'What is the total amount of property, plant, and equipment, net, reported by the company for the fiscal year 2023?',
398
+ ]
399
+ embeddings = model.encode(sentences)
400
+ print(embeddings.shape)
401
+ # [3, 768]
402
+
403
+ # Get the similarity scores for the embeddings
404
+ similarities = model.similarity(embeddings, embeddings)
405
+ print(similarities.shape)
406
+ # [3, 3]
407
+ ```
408
+
409
+ <!--
410
+ ### Direct Usage (Transformers)
411
+
412
+ <details><summary>Click to see the direct usage in Transformers</summary>
413
+
414
+ </details>
415
+ -->
416
+
417
+ <!--
418
+ ### Downstream Usage (Sentence Transformers)
419
+
420
+ You can finetune this model on your own dataset.
421
+
422
+ <details><summary>Click to expand</summary>
423
+
424
+ </details>
425
+ -->
426
+
427
+ <!--
428
+ ### Out-of-Scope Use
429
+
430
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
431
+ -->
432
+
433
+ ## Evaluation
434
+
435
+ ### Metrics
436
+
437
+ #### Information Retrieval
438
+ * Dataset: `dim_768`
439
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
440
+
441
+ | Metric | Value |
442
+ |:--------------------|:-----------|
443
+ | cosine_accuracy@1 | 0.6829 |
444
+ | cosine_accuracy@3 | 0.8229 |
445
+ | cosine_accuracy@5 | 0.86 |
446
+ | cosine_accuracy@10 | 0.9057 |
447
+ | cosine_precision@1 | 0.6829 |
448
+ | cosine_precision@3 | 0.2743 |
449
+ | cosine_precision@5 | 0.172 |
450
+ | cosine_precision@10 | 0.0906 |
451
+ | cosine_recall@1 | 0.6829 |
452
+ | cosine_recall@3 | 0.8229 |
453
+ | cosine_recall@5 | 0.86 |
454
+ | cosine_recall@10 | 0.9057 |
455
+ | cosine_ndcg@10 | 0.7961 |
456
+ | cosine_mrr@10 | 0.7608 |
457
+ | **cosine_map@100** | **0.7647** |
458
+
459
+ #### Information Retrieval
460
+ * Dataset: `dim_512`
461
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
462
+
463
+ | Metric | Value |
464
+ |:--------------------|:-----------|
465
+ | cosine_accuracy@1 | 0.6843 |
466
+ | cosine_accuracy@3 | 0.8229 |
467
+ | cosine_accuracy@5 | 0.8557 |
468
+ | cosine_accuracy@10 | 0.9014 |
469
+ | cosine_precision@1 | 0.6843 |
470
+ | cosine_precision@3 | 0.2743 |
471
+ | cosine_precision@5 | 0.1711 |
472
+ | cosine_precision@10 | 0.0901 |
473
+ | cosine_recall@1 | 0.6843 |
474
+ | cosine_recall@3 | 0.8229 |
475
+ | cosine_recall@5 | 0.8557 |
476
+ | cosine_recall@10 | 0.9014 |
477
+ | cosine_ndcg@10 | 0.794 |
478
+ | cosine_mrr@10 | 0.7594 |
479
+ | **cosine_map@100** | **0.7636** |
480
+
481
+ #### Information Retrieval
482
+ * Dataset: `dim_256`
483
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
484
+
485
+ | Metric | Value |
486
+ |:--------------------|:-----------|
487
+ | cosine_accuracy@1 | 0.68 |
488
+ | cosine_accuracy@3 | 0.8114 |
489
+ | cosine_accuracy@5 | 0.85 |
490
+ | cosine_accuracy@10 | 0.8943 |
491
+ | cosine_precision@1 | 0.68 |
492
+ | cosine_precision@3 | 0.2705 |
493
+ | cosine_precision@5 | 0.17 |
494
+ | cosine_precision@10 | 0.0894 |
495
+ | cosine_recall@1 | 0.68 |
496
+ | cosine_recall@3 | 0.8114 |
497
+ | cosine_recall@5 | 0.85 |
498
+ | cosine_recall@10 | 0.8943 |
499
+ | cosine_ndcg@10 | 0.7889 |
500
+ | cosine_mrr@10 | 0.755 |
501
+ | **cosine_map@100** | **0.7594** |
502
+
503
+ #### Information Retrieval
504
+ * Dataset: `dim_128`
505
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
506
+
507
+ | Metric | Value |
508
+ |:--------------------|:-----------|
509
+ | cosine_accuracy@1 | 0.6571 |
510
+ | cosine_accuracy@3 | 0.7943 |
511
+ | cosine_accuracy@5 | 0.8343 |
512
+ | cosine_accuracy@10 | 0.8886 |
513
+ | cosine_precision@1 | 0.6571 |
514
+ | cosine_precision@3 | 0.2648 |
515
+ | cosine_precision@5 | 0.1669 |
516
+ | cosine_precision@10 | 0.0889 |
517
+ | cosine_recall@1 | 0.6571 |
518
+ | cosine_recall@3 | 0.7943 |
519
+ | cosine_recall@5 | 0.8343 |
520
+ | cosine_recall@10 | 0.8886 |
521
+ | cosine_ndcg@10 | 0.773 |
522
+ | cosine_mrr@10 | 0.7361 |
523
+ | **cosine_map@100** | **0.7403** |
524
+
525
+ #### Information Retrieval
526
+ * Dataset: `dim_64`
527
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
528
+
529
+ | Metric | Value |
530
+ |:--------------------|:-----------|
531
+ | cosine_accuracy@1 | 0.6186 |
532
+ | cosine_accuracy@3 | 0.76 |
533
+ | cosine_accuracy@5 | 0.8 |
534
+ | cosine_accuracy@10 | 0.8657 |
535
+ | cosine_precision@1 | 0.6186 |
536
+ | cosine_precision@3 | 0.2533 |
537
+ | cosine_precision@5 | 0.16 |
538
+ | cosine_precision@10 | 0.0866 |
539
+ | cosine_recall@1 | 0.6186 |
540
+ | cosine_recall@3 | 0.76 |
541
+ | cosine_recall@5 | 0.8 |
542
+ | cosine_recall@10 | 0.8657 |
543
+ | cosine_ndcg@10 | 0.7409 |
544
+ | cosine_mrr@10 | 0.7013 |
545
+ | **cosine_map@100** | **0.7062** |
546
+
547
+ <!--
548
+ ## Bias, Risks and Limitations
549
+
550
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
551
+ -->
552
+
553
+ <!--
554
+ ### Recommendations
555
+
556
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
557
+ -->
558
+
559
+ ## Training Details
560
+
561
+ ### Training Dataset
562
+
563
+ #### Unnamed Dataset
564
+
565
+
566
+ * Size: 6,300 training samples
567
+ * Columns: <code>positive</code> and <code>anchor</code>
568
+ * Approximate statistics based on the first 1000 samples:
569
+ | | positive | anchor |
570
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
571
+ | type | string | string |
572
+ | details | <ul><li>min: 8 tokens</li><li>mean: 45.24 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 20.71 tokens</li><li>max: 45 tokens</li></ul> |
573
+ * Samples:
574
+ | positive | anchor |
575
+ ||:---------------------------------------------------------------------------------------------------------------------------------|
576
+ | <code>Changes in Costs. Our costs are subject to fluctuations, particularly due to changes in commodity and input material prices, transportation costs, other broader inflationary impacts and our own productivity efforts. We have significant exposures to certain commodities and input materials, in particular certain oil-derived materials like resins and paper-based materials like pulp. Volatility in the market price of these commodities and input materials has a direct impact on our costs. Disruptions in our manufacturing, supply and distribution operations due to energy shortages, natural disasters, labor or freight constraints have impacted our costs and could do so in the future. New or increased legal or regulatory requirements, along with initiatives to meet our sustainability goals, could also result in increased costs due to higher material costs and investments in facilities and equipment. We strive to implement, achieve and sustain cost improvement plans, including supply chain optimization and general overhead and workforce optimization. Increased pricing in response to certain inflationary or cost increases may also offset portions of the cost impacts; however, such price increases may impact product consumption. If we are unable to manage cost impacts through pricing actions and consistent productivity improvements, it may adversely impact our net sales, gross margin, operating margin, net earnings and cash flows.</code> | <code>How did Procter & Gamble manage the fluctuations in costs, particularly related to commodities and input materials?</code> |
577
+ | <code>As of October 1, 2023 we had ¥5 billion, or $33.5 million, of borrowings outstanding under these credit facilities.</code> | <code>How much was borrowed under the Japanese yen-denominated credit facilities as of October 1, 2023?</code> |
578
+ | <code>AutoZone sells automotive hard parts, maintenance items, accessories and non-automotive products through www.autozone.com, and commercial customers can make purchases through www.autozonepro.com. Additionally, the ALLDATA brand of automotive diagnostic, repair, collision and shop management software is sold through www.alldata.com.</code> | <code>What online platforms does AutoZone use for selling automotive products and services?</code> |
579
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
580
+ ```json
581
+ {
582
+ "loss": "MultipleNegativesRankingLoss",
583
+ "matryoshka_dims": [
584
+ 768,
585
+ 512,
586
+ 256,
587
+ 128,
588
+ 64
589
+ ],
590
+ "matryoshka_weights": [
591
+ 1,
592
+ 1,
593
+ 1,
594
+ 1,
595
+ 1
596
+ ],
597
+ "n_dims_per_step": -1
598
+ }
599
+ ```
600
+
601
+ ### Training Hyperparameters
602
+ #### Non-Default Hyperparameters
603
+
604
+ - `eval_strategy`: epoch
605
+ - `per_device_train_batch_size`: 32
606
+ - `per_device_eval_batch_size`: 16
607
+ - `gradient_accumulation_steps`: 16
608
+ - `learning_rate`: 2e-05
609
+ - `num_train_epochs`: 4
610
+ - `lr_scheduler_type`: cosine
611
+ - `warmup_ratio`: 0.1
612
+ - `bf16`: True
613
+ - `tf32`: False
614
+ - `load_best_model_at_end`: True
615
+ - `optim`: adamw_torch_fused
616
+ - `batch_sampler`: no_duplicates
617
+
618
+ #### All Hyperparameters
619
+ <details><summary>Click to expand</summary>
620
+
621
+ - `overwrite_output_dir`: False
622
+ - `do_predict`: False
623
+ - `eval_strategy`: epoch
624
+ - `prediction_loss_only`: True
625
+ - `per_device_train_batch_size`: 32
626
+ - `per_device_eval_batch_size`: 16
627
+ - `per_gpu_train_batch_size`: None
628
+ - `per_gpu_eval_batch_size`: None
629
+ - `gradient_accumulation_steps`: 16
630
+ - `eval_accumulation_steps`: None
631
+ - `learning_rate`: 2e-05
632
+ - `weight_decay`: 0.0
633
+ - `adam_beta1`: 0.9
634
+ - `adam_beta2`: 0.999
635
+ - `adam_epsilon`: 1e-08
636
+ - `max_grad_norm`: 1.0
637
+ - `num_train_epochs`: 4
638
+ - `max_steps`: -1
639
+ - `lr_scheduler_type`: cosine
640
+ - `lr_scheduler_kwargs`: {}
641
+ - `warmup_ratio`: 0.1
642
+ - `warmup_steps`: 0
643
+ - `log_level`: passive
644
+ - `log_level_replica`: warning
645
+ - `log_on_each_node`: True
646
+ - `logging_nan_inf_filter`: True
647
+ - `save_safetensors`: True
648
+ - `save_on_each_node`: False
649
+ - `save_only_model`: False
650
+ - `restore_callback_states_from_checkpoint`: False
651
+ - `no_cuda`: False
652
+ - `use_cpu`: False
653
+ - `use_mps_device`: False
654
+ - `seed`: 42
655
+ - `data_seed`: None
656
+ - `jit_mode_eval`: False
657
+ - `use_ipex`: False
658
+ - `bf16`: True
659
+ - `fp16`: False
660
+ - `fp16_opt_level`: O1
661
+ - `half_precision_backend`: auto
662
+ - `bf16_full_eval`: False
663
+ - `fp16_full_eval`: False
664
+ - `tf32`: False
665
+ - `local_rank`: 0
666
+ - `ddp_backend`: None
667
+ - `tpu_num_cores`: None
668
+ - `tpu_metrics_debug`: False
669
+ - `debug`: []
670
+ - `dataloader_drop_last`: False
671
+ - `dataloader_num_workers`: 0
672
+ - `dataloader_prefetch_factor`: None
673
+ - `past_index`: -1
674
+ - `disable_tqdm`: False
675
+ - `remove_unused_columns`: True
676
+ - `label_names`: None
677
+ - `load_best_model_at_end`: True
678
+ - `ignore_data_skip`: False
679
+ - `fsdp`: []
680
+ - `fsdp_min_num_params`: 0
681
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
682
+ - `fsdp_transformer_layer_cls_to_wrap`: None
683
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
684
+ - `deepspeed`: None
685
+ - `label_smoothing_factor`: 0.0
686
+ - `optim`: adamw_torch_fused
687
+ - `optim_args`: None
688
+ - `adafactor`: False
689
+ - `group_by_length`: False
690
+ - `length_column_name`: length
691
+ - `ddp_find_unused_parameters`: None
692
+ - `ddp_bucket_cap_mb`: None
693
+ - `ddp_broadcast_buffers`: False
694
+ - `dataloader_pin_memory`: True
695
+ - `dataloader_persistent_workers`: False
696
+ - `skip_memory_metrics`: True
697
+ - `use_legacy_prediction_loop`: False
698
+ - `push_to_hub`: False
699
+ - `resume_from_checkpoint`: None
700
+ - `hub_model_id`: None
701
+ - `hub_strategy`: every_save
702
+ - `hub_private_repo`: False
703
+ - `hub_always_push`: False
704
+ - `gradient_checkpointing`: False
705
+ - `gradient_checkpointing_kwargs`: None
706
+ - `include_inputs_for_metrics`: False
707
+ - `eval_do_concat_batches`: True
708
+ - `fp16_backend`: auto
709
+ - `push_to_hub_model_id`: None
710
+ - `push_to_hub_organization`: None
711
+ - `mp_parameters`:
712
+ - `auto_find_batch_size`: False
713
+ - `full_determinism`: False
714
+ - `torchdynamo`: None
715
+ - `ray_scope`: last
716
+ - `ddp_timeout`: 1800
717
+ - `torch_compile`: False
718
+ - `torch_compile_backend`: None
719
+ - `torch_compile_mode`: None
720
+ - `dispatch_batches`: None
721
+ - `split_batches`: None
722
+ - `include_tokens_per_second`: False
723
+ - `include_num_input_tokens_seen`: False
724
+ - `neftune_noise_alpha`: None
725
+ - `optim_target_modules`: None
726
+ - `batch_eval_metrics`: False
727
+ - `batch_sampler`: no_duplicates
728
+ - `multi_dataset_batch_sampler`: proportional
729
+
730
+ </details>
731
+
732
+ ### Training Logs
733
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
734
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
735
+ | 0.8122 | 10 | 1.5647 | - | - | - | - | - |
736
+ | 0.9746 | 12 | - | 0.7160 | 0.7404 | 0.7515 | 0.6797 | 0.7533 |
737
+ | 1.6244 | 20 | 0.6629 | - | - | - | - | - |
738
+ | 1.9492 | 24 | - | 0.7340 | 0.7582 | 0.7611 | 0.6996 | 0.7603 |
739
+ | 2.4365 | 30 | 0.4811 | - | - | - | - | - |
740
+ | **2.9239** | **36** | **-** | **0.7403** | **0.759** | **0.7638** | **0.7056** | **0.7646** |
741
+ | 3.2487 | 40 | 0.4046 | - | - | - | - | - |
742
+ | 3.8985 | 48 | - | 0.7403 | 0.7594 | 0.7636 | 0.7062 | 0.7647 |
743
+
744
+ * The bold row denotes the saved checkpoint.
745
+
746
+ ### Framework Versions
747
+ - Python: 3.10.12
748
+ - Sentence Transformers: 3.0.1
749
+ - Transformers: 4.41.2
750
+ - PyTorch: 2.3.0+cu121
751
+ - Accelerate: 0.31.0
752
+ - Datasets: 2.19.2
753
+ - Tokenizers: 0.19.1
754
+
755
+ ## Citation
756
+
757
+ ### BibTeX
758
+
759
+ #### Sentence Transformers
760
+ ```bibtex
761
+ @inproceedings{reimers-2019-sentence-bert,
762
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
763
+ author = "Reimers, Nils and Gurevych, Iryna",
764
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
765
+ month = "11",
766
+ year = "2019",
767
+ publisher = "Association for Computational Linguistics",
768
+ url = "https://arxiv.org/abs/1908.10084",
769
+ }
770
+ ```
771
+
772
+ #### MatryoshkaLoss
773
+ ```bibtex
774
+ @misc{kusupati2024matryoshka,
775
+ title={Matryoshka Representation Learning},
776
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
777
+ year={2024},
778
+ eprint={2205.13147},
779
+ archivePrefix={arXiv},
780
+ primaryClass={cs.LG}
781
+ }
782
+ ```
783
+
784
+ #### MultipleNegativesRankingLoss
785
+ ```bibtex
786
+ @misc{henderson2017efficient,
787
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
788
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
789
+ year={2017},
790
+ eprint={1705.00652},
791
+ archivePrefix={arXiv},
792
+ primaryClass={cs.CL}
793
+ }
794
+ ```
795
+
796
+ <!--
797
+ ## Glossary
798
+
799
+ *Clearly define terms in order to be accessible across audiences.*
800
+ -->
801
+
802
+ <!--
803
+ ## Model Card Authors
804
+
805
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
806
+ -->
807
+
808
+ <!--
809
+ ## Model Card Contact
810
+
811
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
812
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d44f2d008526f29da087126724989874ab2336e572fef999b2340ff649ea7ea5
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff