michaelfeil commited on
Commit
050c31b
·
1 Parent(s): d0c4eac

Upload BAAI/bge-large-en-v1.5 ctranslate2 weights

Browse files
README.md ADDED
@@ -0,0 +1,3081 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ctranslate2
4
+ - int8
5
+ - float16
6
+ - sentence-transformers
7
+ - feature-extraction
8
+ - sentence-similarity
9
+ - transformers
10
+ - mteb
11
+ model-index:
12
+ - name: bge-large-en-v1.5
13
+ results:
14
+ - task:
15
+ type: Classification
16
+ dataset:
17
+ type: mteb/amazon_counterfactual
18
+ name: MTEB AmazonCounterfactualClassification (en)
19
+ config: en
20
+ split: test
21
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
22
+ metrics:
23
+ - type: accuracy
24
+ value: 75.8507462686567
25
+ - type: ap
26
+ value: 38.566457320228245
27
+ - type: f1
28
+ value: 69.69386648043475
29
+ - task:
30
+ type: Classification
31
+ dataset:
32
+ type: mteb/amazon_polarity
33
+ name: MTEB AmazonPolarityClassification
34
+ config: default
35
+ split: test
36
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
37
+ metrics:
38
+ - type: accuracy
39
+ value: 92.416675
40
+ - type: ap
41
+ value: 89.1928861155922
42
+ - type: f1
43
+ value: 92.39477019574215
44
+ - task:
45
+ type: Classification
46
+ dataset:
47
+ type: mteb/amazon_reviews_multi
48
+ name: MTEB AmazonReviewsClassification (en)
49
+ config: en
50
+ split: test
51
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
52
+ metrics:
53
+ - type: accuracy
54
+ value: 48.175999999999995
55
+ - type: f1
56
+ value: 47.80712792870253
57
+ - task:
58
+ type: Retrieval
59
+ dataset:
60
+ type: arguana
61
+ name: MTEB ArguAna
62
+ config: default
63
+ split: test
64
+ revision: None
65
+ metrics:
66
+ - type: map_at_1
67
+ value: 40.184999999999995
68
+ - type: map_at_10
69
+ value: 55.654
70
+ - type: map_at_100
71
+ value: 56.25
72
+ - type: map_at_1000
73
+ value: 56.255
74
+ - type: map_at_3
75
+ value: 51.742999999999995
76
+ - type: map_at_5
77
+ value: 54.129000000000005
78
+ - type: mrr_at_1
79
+ value: 40.967
80
+ - type: mrr_at_10
81
+ value: 55.96
82
+ - type: mrr_at_100
83
+ value: 56.54900000000001
84
+ - type: mrr_at_1000
85
+ value: 56.554
86
+ - type: mrr_at_3
87
+ value: 51.980000000000004
88
+ - type: mrr_at_5
89
+ value: 54.44
90
+ - type: ndcg_at_1
91
+ value: 40.184999999999995
92
+ - type: ndcg_at_10
93
+ value: 63.542
94
+ - type: ndcg_at_100
95
+ value: 65.96499999999999
96
+ - type: ndcg_at_1000
97
+ value: 66.08699999999999
98
+ - type: ndcg_at_3
99
+ value: 55.582
100
+ - type: ndcg_at_5
101
+ value: 59.855000000000004
102
+ - type: precision_at_1
103
+ value: 40.184999999999995
104
+ - type: precision_at_10
105
+ value: 8.841000000000001
106
+ - type: precision_at_100
107
+ value: 0.987
108
+ - type: precision_at_1000
109
+ value: 0.1
110
+ - type: precision_at_3
111
+ value: 22.238
112
+ - type: precision_at_5
113
+ value: 15.405
114
+ - type: recall_at_1
115
+ value: 40.184999999999995
116
+ - type: recall_at_10
117
+ value: 88.407
118
+ - type: recall_at_100
119
+ value: 98.72
120
+ - type: recall_at_1000
121
+ value: 99.644
122
+ - type: recall_at_3
123
+ value: 66.714
124
+ - type: recall_at_5
125
+ value: 77.027
126
+ - task:
127
+ type: Clustering
128
+ dataset:
129
+ type: mteb/arxiv-clustering-p2p
130
+ name: MTEB ArxivClusteringP2P
131
+ config: default
132
+ split: test
133
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
134
+ metrics:
135
+ - type: v_measure
136
+ value: 48.567077926750066
137
+ - task:
138
+ type: Clustering
139
+ dataset:
140
+ type: mteb/arxiv-clustering-s2s
141
+ name: MTEB ArxivClusteringS2S
142
+ config: default
143
+ split: test
144
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
145
+ metrics:
146
+ - type: v_measure
147
+ value: 43.19453389182364
148
+ - task:
149
+ type: Reranking
150
+ dataset:
151
+ type: mteb/askubuntudupquestions-reranking
152
+ name: MTEB AskUbuntuDupQuestions
153
+ config: default
154
+ split: test
155
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
156
+ metrics:
157
+ - type: map
158
+ value: 64.46555939623092
159
+ - type: mrr
160
+ value: 77.82361605768807
161
+ - task:
162
+ type: STS
163
+ dataset:
164
+ type: mteb/biosses-sts
165
+ name: MTEB BIOSSES
166
+ config: default
167
+ split: test
168
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
169
+ metrics:
170
+ - type: cos_sim_pearson
171
+ value: 84.9554128814735
172
+ - type: cos_sim_spearman
173
+ value: 84.65373612172036
174
+ - type: euclidean_pearson
175
+ value: 83.2905059954138
176
+ - type: euclidean_spearman
177
+ value: 84.52240782811128
178
+ - type: manhattan_pearson
179
+ value: 82.99533802997436
180
+ - type: manhattan_spearman
181
+ value: 84.20673798475734
182
+ - task:
183
+ type: Classification
184
+ dataset:
185
+ type: mteb/banking77
186
+ name: MTEB Banking77Classification
187
+ config: default
188
+ split: test
189
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
190
+ metrics:
191
+ - type: accuracy
192
+ value: 87.78896103896103
193
+ - type: f1
194
+ value: 87.77189310964883
195
+ - task:
196
+ type: Clustering
197
+ dataset:
198
+ type: mteb/biorxiv-clustering-p2p
199
+ name: MTEB BiorxivClusteringP2P
200
+ config: default
201
+ split: test
202
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
203
+ metrics:
204
+ - type: v_measure
205
+ value: 39.714538337650495
206
+ - task:
207
+ type: Clustering
208
+ dataset:
209
+ type: mteb/biorxiv-clustering-s2s
210
+ name: MTEB BiorxivClusteringS2S
211
+ config: default
212
+ split: test
213
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
214
+ metrics:
215
+ - type: v_measure
216
+ value: 36.90108349284447
217
+ - task:
218
+ type: Retrieval
219
+ dataset:
220
+ type: BeIR/cqadupstack
221
+ name: MTEB CQADupstackAndroidRetrieval
222
+ config: default
223
+ split: test
224
+ revision: None
225
+ metrics:
226
+ - type: map_at_1
227
+ value: 32.795
228
+ - type: map_at_10
229
+ value: 43.669000000000004
230
+ - type: map_at_100
231
+ value: 45.151
232
+ - type: map_at_1000
233
+ value: 45.278
234
+ - type: map_at_3
235
+ value: 40.006
236
+ - type: map_at_5
237
+ value: 42.059999999999995
238
+ - type: mrr_at_1
239
+ value: 39.771
240
+ - type: mrr_at_10
241
+ value: 49.826
242
+ - type: mrr_at_100
243
+ value: 50.504000000000005
244
+ - type: mrr_at_1000
245
+ value: 50.549
246
+ - type: mrr_at_3
247
+ value: 47.115
248
+ - type: mrr_at_5
249
+ value: 48.832
250
+ - type: ndcg_at_1
251
+ value: 39.771
252
+ - type: ndcg_at_10
253
+ value: 50.217999999999996
254
+ - type: ndcg_at_100
255
+ value: 55.454
256
+ - type: ndcg_at_1000
257
+ value: 57.37
258
+ - type: ndcg_at_3
259
+ value: 44.885000000000005
260
+ - type: ndcg_at_5
261
+ value: 47.419
262
+ - type: precision_at_1
263
+ value: 39.771
264
+ - type: precision_at_10
265
+ value: 9.642000000000001
266
+ - type: precision_at_100
267
+ value: 1.538
268
+ - type: precision_at_1000
269
+ value: 0.198
270
+ - type: precision_at_3
271
+ value: 21.268
272
+ - type: precision_at_5
273
+ value: 15.536
274
+ - type: recall_at_1
275
+ value: 32.795
276
+ - type: recall_at_10
277
+ value: 62.580999999999996
278
+ - type: recall_at_100
279
+ value: 84.438
280
+ - type: recall_at_1000
281
+ value: 96.492
282
+ - type: recall_at_3
283
+ value: 47.071000000000005
284
+ - type: recall_at_5
285
+ value: 54.079
286
+ - task:
287
+ type: Retrieval
288
+ dataset:
289
+ type: BeIR/cqadupstack
290
+ name: MTEB CQADupstackEnglishRetrieval
291
+ config: default
292
+ split: test
293
+ revision: None
294
+ metrics:
295
+ - type: map_at_1
296
+ value: 32.671
297
+ - type: map_at_10
298
+ value: 43.334
299
+ - type: map_at_100
300
+ value: 44.566
301
+ - type: map_at_1000
302
+ value: 44.702999999999996
303
+ - type: map_at_3
304
+ value: 40.343
305
+ - type: map_at_5
306
+ value: 41.983
307
+ - type: mrr_at_1
308
+ value: 40.764
309
+ - type: mrr_at_10
310
+ value: 49.382
311
+ - type: mrr_at_100
312
+ value: 49.988
313
+ - type: mrr_at_1000
314
+ value: 50.03300000000001
315
+ - type: mrr_at_3
316
+ value: 47.293
317
+ - type: mrr_at_5
318
+ value: 48.51
319
+ - type: ndcg_at_1
320
+ value: 40.764
321
+ - type: ndcg_at_10
322
+ value: 49.039
323
+ - type: ndcg_at_100
324
+ value: 53.259
325
+ - type: ndcg_at_1000
326
+ value: 55.253
327
+ - type: ndcg_at_3
328
+ value: 45.091
329
+ - type: ndcg_at_5
330
+ value: 46.839999999999996
331
+ - type: precision_at_1
332
+ value: 40.764
333
+ - type: precision_at_10
334
+ value: 9.191
335
+ - type: precision_at_100
336
+ value: 1.476
337
+ - type: precision_at_1000
338
+ value: 0.19499999999999998
339
+ - type: precision_at_3
340
+ value: 21.72
341
+ - type: precision_at_5
342
+ value: 15.299
343
+ - type: recall_at_1
344
+ value: 32.671
345
+ - type: recall_at_10
346
+ value: 58.816
347
+ - type: recall_at_100
348
+ value: 76.654
349
+ - type: recall_at_1000
350
+ value: 89.05999999999999
351
+ - type: recall_at_3
352
+ value: 46.743
353
+ - type: recall_at_5
354
+ value: 51.783
355
+ - task:
356
+ type: Retrieval
357
+ dataset:
358
+ type: BeIR/cqadupstack
359
+ name: MTEB CQADupstackGamingRetrieval
360
+ config: default
361
+ split: test
362
+ revision: None
363
+ metrics:
364
+ - type: map_at_1
365
+ value: 40.328
366
+ - type: map_at_10
367
+ value: 53.32599999999999
368
+ - type: map_at_100
369
+ value: 54.37499999999999
370
+ - type: map_at_1000
371
+ value: 54.429
372
+ - type: map_at_3
373
+ value: 49.902
374
+ - type: map_at_5
375
+ value: 52.002
376
+ - type: mrr_at_1
377
+ value: 46.332
378
+ - type: mrr_at_10
379
+ value: 56.858
380
+ - type: mrr_at_100
381
+ value: 57.522
382
+ - type: mrr_at_1000
383
+ value: 57.54899999999999
384
+ - type: mrr_at_3
385
+ value: 54.472
386
+ - type: mrr_at_5
387
+ value: 55.996
388
+ - type: ndcg_at_1
389
+ value: 46.332
390
+ - type: ndcg_at_10
391
+ value: 59.313
392
+ - type: ndcg_at_100
393
+ value: 63.266999999999996
394
+ - type: ndcg_at_1000
395
+ value: 64.36
396
+ - type: ndcg_at_3
397
+ value: 53.815000000000005
398
+ - type: ndcg_at_5
399
+ value: 56.814
400
+ - type: precision_at_1
401
+ value: 46.332
402
+ - type: precision_at_10
403
+ value: 9.53
404
+ - type: precision_at_100
405
+ value: 1.238
406
+ - type: precision_at_1000
407
+ value: 0.13699999999999998
408
+ - type: precision_at_3
409
+ value: 24.054000000000002
410
+ - type: precision_at_5
411
+ value: 16.589000000000002
412
+ - type: recall_at_1
413
+ value: 40.328
414
+ - type: recall_at_10
415
+ value: 73.421
416
+ - type: recall_at_100
417
+ value: 90.059
418
+ - type: recall_at_1000
419
+ value: 97.81
420
+ - type: recall_at_3
421
+ value: 59.009
422
+ - type: recall_at_5
423
+ value: 66.352
424
+ - task:
425
+ type: Retrieval
426
+ dataset:
427
+ type: BeIR/cqadupstack
428
+ name: MTEB CQADupstackGisRetrieval
429
+ config: default
430
+ split: test
431
+ revision: None
432
+ metrics:
433
+ - type: map_at_1
434
+ value: 27.424
435
+ - type: map_at_10
436
+ value: 36.332
437
+ - type: map_at_100
438
+ value: 37.347
439
+ - type: map_at_1000
440
+ value: 37.422
441
+ - type: map_at_3
442
+ value: 33.743
443
+ - type: map_at_5
444
+ value: 35.176
445
+ - type: mrr_at_1
446
+ value: 29.153000000000002
447
+ - type: mrr_at_10
448
+ value: 38.233
449
+ - type: mrr_at_100
450
+ value: 39.109
451
+ - type: mrr_at_1000
452
+ value: 39.164
453
+ - type: mrr_at_3
454
+ value: 35.876000000000005
455
+ - type: mrr_at_5
456
+ value: 37.169000000000004
457
+ - type: ndcg_at_1
458
+ value: 29.153000000000002
459
+ - type: ndcg_at_10
460
+ value: 41.439
461
+ - type: ndcg_at_100
462
+ value: 46.42
463
+ - type: ndcg_at_1000
464
+ value: 48.242000000000004
465
+ - type: ndcg_at_3
466
+ value: 36.362
467
+ - type: ndcg_at_5
468
+ value: 38.743
469
+ - type: precision_at_1
470
+ value: 29.153000000000002
471
+ - type: precision_at_10
472
+ value: 6.315999999999999
473
+ - type: precision_at_100
474
+ value: 0.927
475
+ - type: precision_at_1000
476
+ value: 0.11199999999999999
477
+ - type: precision_at_3
478
+ value: 15.443000000000001
479
+ - type: precision_at_5
480
+ value: 10.644
481
+ - type: recall_at_1
482
+ value: 27.424
483
+ - type: recall_at_10
484
+ value: 55.364000000000004
485
+ - type: recall_at_100
486
+ value: 78.211
487
+ - type: recall_at_1000
488
+ value: 91.74600000000001
489
+ - type: recall_at_3
490
+ value: 41.379
491
+ - type: recall_at_5
492
+ value: 47.14
493
+ - task:
494
+ type: Retrieval
495
+ dataset:
496
+ type: BeIR/cqadupstack
497
+ name: MTEB CQADupstackMathematicaRetrieval
498
+ config: default
499
+ split: test
500
+ revision: None
501
+ metrics:
502
+ - type: map_at_1
503
+ value: 19.601
504
+ - type: map_at_10
505
+ value: 27.826
506
+ - type: map_at_100
507
+ value: 29.017
508
+ - type: map_at_1000
509
+ value: 29.137
510
+ - type: map_at_3
511
+ value: 25.125999999999998
512
+ - type: map_at_5
513
+ value: 26.765
514
+ - type: mrr_at_1
515
+ value: 24.005000000000003
516
+ - type: mrr_at_10
517
+ value: 32.716
518
+ - type: mrr_at_100
519
+ value: 33.631
520
+ - type: mrr_at_1000
521
+ value: 33.694
522
+ - type: mrr_at_3
523
+ value: 29.934
524
+ - type: mrr_at_5
525
+ value: 31.630999999999997
526
+ - type: ndcg_at_1
527
+ value: 24.005000000000003
528
+ - type: ndcg_at_10
529
+ value: 33.158
530
+ - type: ndcg_at_100
531
+ value: 38.739000000000004
532
+ - type: ndcg_at_1000
533
+ value: 41.495
534
+ - type: ndcg_at_3
535
+ value: 28.185
536
+ - type: ndcg_at_5
537
+ value: 30.796
538
+ - type: precision_at_1
539
+ value: 24.005000000000003
540
+ - type: precision_at_10
541
+ value: 5.908
542
+ - type: precision_at_100
543
+ value: 1.005
544
+ - type: precision_at_1000
545
+ value: 0.13899999999999998
546
+ - type: precision_at_3
547
+ value: 13.391
548
+ - type: precision_at_5
549
+ value: 9.876
550
+ - type: recall_at_1
551
+ value: 19.601
552
+ - type: recall_at_10
553
+ value: 44.746
554
+ - type: recall_at_100
555
+ value: 68.82300000000001
556
+ - type: recall_at_1000
557
+ value: 88.215
558
+ - type: recall_at_3
559
+ value: 31.239
560
+ - type: recall_at_5
561
+ value: 37.695
562
+ - task:
563
+ type: Retrieval
564
+ dataset:
565
+ type: BeIR/cqadupstack
566
+ name: MTEB CQADupstackPhysicsRetrieval
567
+ config: default
568
+ split: test
569
+ revision: None
570
+ metrics:
571
+ - type: map_at_1
572
+ value: 30.130000000000003
573
+ - type: map_at_10
574
+ value: 40.96
575
+ - type: map_at_100
576
+ value: 42.282
577
+ - type: map_at_1000
578
+ value: 42.392
579
+ - type: map_at_3
580
+ value: 37.889
581
+ - type: map_at_5
582
+ value: 39.661
583
+ - type: mrr_at_1
584
+ value: 36.958999999999996
585
+ - type: mrr_at_10
586
+ value: 46.835
587
+ - type: mrr_at_100
588
+ value: 47.644
589
+ - type: mrr_at_1000
590
+ value: 47.688
591
+ - type: mrr_at_3
592
+ value: 44.562000000000005
593
+ - type: mrr_at_5
594
+ value: 45.938
595
+ - type: ndcg_at_1
596
+ value: 36.958999999999996
597
+ - type: ndcg_at_10
598
+ value: 47.06
599
+ - type: ndcg_at_100
600
+ value: 52.345
601
+ - type: ndcg_at_1000
602
+ value: 54.35
603
+ - type: ndcg_at_3
604
+ value: 42.301
605
+ - type: ndcg_at_5
606
+ value: 44.635999999999996
607
+ - type: precision_at_1
608
+ value: 36.958999999999996
609
+ - type: precision_at_10
610
+ value: 8.479000000000001
611
+ - type: precision_at_100
612
+ value: 1.284
613
+ - type: precision_at_1000
614
+ value: 0.163
615
+ - type: precision_at_3
616
+ value: 20.244
617
+ - type: precision_at_5
618
+ value: 14.224999999999998
619
+ - type: recall_at_1
620
+ value: 30.130000000000003
621
+ - type: recall_at_10
622
+ value: 59.27
623
+ - type: recall_at_100
624
+ value: 81.195
625
+ - type: recall_at_1000
626
+ value: 94.21199999999999
627
+ - type: recall_at_3
628
+ value: 45.885
629
+ - type: recall_at_5
630
+ value: 52.016
631
+ - task:
632
+ type: Retrieval
633
+ dataset:
634
+ type: BeIR/cqadupstack
635
+ name: MTEB CQADupstackProgrammersRetrieval
636
+ config: default
637
+ split: test
638
+ revision: None
639
+ metrics:
640
+ - type: map_at_1
641
+ value: 26.169999999999998
642
+ - type: map_at_10
643
+ value: 36.451
644
+ - type: map_at_100
645
+ value: 37.791000000000004
646
+ - type: map_at_1000
647
+ value: 37.897
648
+ - type: map_at_3
649
+ value: 33.109
650
+ - type: map_at_5
651
+ value: 34.937000000000005
652
+ - type: mrr_at_1
653
+ value: 32.877
654
+ - type: mrr_at_10
655
+ value: 42.368
656
+ - type: mrr_at_100
657
+ value: 43.201
658
+ - type: mrr_at_1000
659
+ value: 43.259
660
+ - type: mrr_at_3
661
+ value: 39.763999999999996
662
+ - type: mrr_at_5
663
+ value: 41.260000000000005
664
+ - type: ndcg_at_1
665
+ value: 32.877
666
+ - type: ndcg_at_10
667
+ value: 42.659000000000006
668
+ - type: ndcg_at_100
669
+ value: 48.161
670
+ - type: ndcg_at_1000
671
+ value: 50.345
672
+ - type: ndcg_at_3
673
+ value: 37.302
674
+ - type: ndcg_at_5
675
+ value: 39.722
676
+ - type: precision_at_1
677
+ value: 32.877
678
+ - type: precision_at_10
679
+ value: 7.9
680
+ - type: precision_at_100
681
+ value: 1.236
682
+ - type: precision_at_1000
683
+ value: 0.158
684
+ - type: precision_at_3
685
+ value: 17.846
686
+ - type: precision_at_5
687
+ value: 12.9
688
+ - type: recall_at_1
689
+ value: 26.169999999999998
690
+ - type: recall_at_10
691
+ value: 55.35
692
+ - type: recall_at_100
693
+ value: 78.755
694
+ - type: recall_at_1000
695
+ value: 93.518
696
+ - type: recall_at_3
697
+ value: 40.176
698
+ - type: recall_at_5
699
+ value: 46.589000000000006
700
+ - task:
701
+ type: Retrieval
702
+ dataset:
703
+ type: BeIR/cqadupstack
704
+ name: MTEB CQADupstackRetrieval
705
+ config: default
706
+ split: test
707
+ revision: None
708
+ metrics:
709
+ - type: map_at_1
710
+ value: 27.15516666666667
711
+ - type: map_at_10
712
+ value: 36.65741666666667
713
+ - type: map_at_100
714
+ value: 37.84991666666666
715
+ - type: map_at_1000
716
+ value: 37.96316666666667
717
+ - type: map_at_3
718
+ value: 33.74974999999999
719
+ - type: map_at_5
720
+ value: 35.3765
721
+ - type: mrr_at_1
722
+ value: 32.08233333333334
723
+ - type: mrr_at_10
724
+ value: 41.033833333333334
725
+ - type: mrr_at_100
726
+ value: 41.84524999999999
727
+ - type: mrr_at_1000
728
+ value: 41.89983333333333
729
+ - type: mrr_at_3
730
+ value: 38.62008333333333
731
+ - type: mrr_at_5
732
+ value: 40.03441666666666
733
+ - type: ndcg_at_1
734
+ value: 32.08233333333334
735
+ - type: ndcg_at_10
736
+ value: 42.229
737
+ - type: ndcg_at_100
738
+ value: 47.26716666666667
739
+ - type: ndcg_at_1000
740
+ value: 49.43466666666667
741
+ - type: ndcg_at_3
742
+ value: 37.36408333333333
743
+ - type: ndcg_at_5
744
+ value: 39.6715
745
+ - type: precision_at_1
746
+ value: 32.08233333333334
747
+ - type: precision_at_10
748
+ value: 7.382583333333334
749
+ - type: precision_at_100
750
+ value: 1.16625
751
+ - type: precision_at_1000
752
+ value: 0.15408333333333332
753
+ - type: precision_at_3
754
+ value: 17.218
755
+ - type: precision_at_5
756
+ value: 12.21875
757
+ - type: recall_at_1
758
+ value: 27.15516666666667
759
+ - type: recall_at_10
760
+ value: 54.36683333333333
761
+ - type: recall_at_100
762
+ value: 76.37183333333333
763
+ - type: recall_at_1000
764
+ value: 91.26183333333333
765
+ - type: recall_at_3
766
+ value: 40.769916666666674
767
+ - type: recall_at_5
768
+ value: 46.702333333333335
769
+ - task:
770
+ type: Retrieval
771
+ dataset:
772
+ type: BeIR/cqadupstack
773
+ name: MTEB CQADupstackStatsRetrieval
774
+ config: default
775
+ split: test
776
+ revision: None
777
+ metrics:
778
+ - type: map_at_1
779
+ value: 25.749
780
+ - type: map_at_10
781
+ value: 33.001999999999995
782
+ - type: map_at_100
783
+ value: 33.891
784
+ - type: map_at_1000
785
+ value: 33.993
786
+ - type: map_at_3
787
+ value: 30.703999999999997
788
+ - type: map_at_5
789
+ value: 31.959
790
+ - type: mrr_at_1
791
+ value: 28.834
792
+ - type: mrr_at_10
793
+ value: 35.955
794
+ - type: mrr_at_100
795
+ value: 36.709
796
+ - type: mrr_at_1000
797
+ value: 36.779
798
+ - type: mrr_at_3
799
+ value: 33.947
800
+ - type: mrr_at_5
801
+ value: 35.089
802
+ - type: ndcg_at_1
803
+ value: 28.834
804
+ - type: ndcg_at_10
805
+ value: 37.329
806
+ - type: ndcg_at_100
807
+ value: 41.79
808
+ - type: ndcg_at_1000
809
+ value: 44.169000000000004
810
+ - type: ndcg_at_3
811
+ value: 33.184999999999995
812
+ - type: ndcg_at_5
813
+ value: 35.107
814
+ - type: precision_at_1
815
+ value: 28.834
816
+ - type: precision_at_10
817
+ value: 5.7669999999999995
818
+ - type: precision_at_100
819
+ value: 0.876
820
+ - type: precision_at_1000
821
+ value: 0.11399999999999999
822
+ - type: precision_at_3
823
+ value: 14.213000000000001
824
+ - type: precision_at_5
825
+ value: 9.754999999999999
826
+ - type: recall_at_1
827
+ value: 25.749
828
+ - type: recall_at_10
829
+ value: 47.791
830
+ - type: recall_at_100
831
+ value: 68.255
832
+ - type: recall_at_1000
833
+ value: 85.749
834
+ - type: recall_at_3
835
+ value: 36.199
836
+ - type: recall_at_5
837
+ value: 41.071999999999996
838
+ - task:
839
+ type: Retrieval
840
+ dataset:
841
+ type: BeIR/cqadupstack
842
+ name: MTEB CQADupstackTexRetrieval
843
+ config: default
844
+ split: test
845
+ revision: None
846
+ metrics:
847
+ - type: map_at_1
848
+ value: 17.777
849
+ - type: map_at_10
850
+ value: 25.201
851
+ - type: map_at_100
852
+ value: 26.423999999999996
853
+ - type: map_at_1000
854
+ value: 26.544
855
+ - type: map_at_3
856
+ value: 22.869
857
+ - type: map_at_5
858
+ value: 24.023
859
+ - type: mrr_at_1
860
+ value: 21.473
861
+ - type: mrr_at_10
862
+ value: 29.12
863
+ - type: mrr_at_100
864
+ value: 30.144
865
+ - type: mrr_at_1000
866
+ value: 30.215999999999998
867
+ - type: mrr_at_3
868
+ value: 26.933
869
+ - type: mrr_at_5
870
+ value: 28.051
871
+ - type: ndcg_at_1
872
+ value: 21.473
873
+ - type: ndcg_at_10
874
+ value: 30.003
875
+ - type: ndcg_at_100
876
+ value: 35.766
877
+ - type: ndcg_at_1000
878
+ value: 38.501000000000005
879
+ - type: ndcg_at_3
880
+ value: 25.773000000000003
881
+ - type: ndcg_at_5
882
+ value: 27.462999999999997
883
+ - type: precision_at_1
884
+ value: 21.473
885
+ - type: precision_at_10
886
+ value: 5.482
887
+ - type: precision_at_100
888
+ value: 0.975
889
+ - type: precision_at_1000
890
+ value: 0.13799999999999998
891
+ - type: precision_at_3
892
+ value: 12.205
893
+ - type: precision_at_5
894
+ value: 8.692
895
+ - type: recall_at_1
896
+ value: 17.777
897
+ - type: recall_at_10
898
+ value: 40.582
899
+ - type: recall_at_100
900
+ value: 66.305
901
+ - type: recall_at_1000
902
+ value: 85.636
903
+ - type: recall_at_3
904
+ value: 28.687
905
+ - type: recall_at_5
906
+ value: 33.089
907
+ - task:
908
+ type: Retrieval
909
+ dataset:
910
+ type: BeIR/cqadupstack
911
+ name: MTEB CQADupstackUnixRetrieval
912
+ config: default
913
+ split: test
914
+ revision: None
915
+ metrics:
916
+ - type: map_at_1
917
+ value: 26.677
918
+ - type: map_at_10
919
+ value: 36.309000000000005
920
+ - type: map_at_100
921
+ value: 37.403999999999996
922
+ - type: map_at_1000
923
+ value: 37.496
924
+ - type: map_at_3
925
+ value: 33.382
926
+ - type: map_at_5
927
+ value: 34.98
928
+ - type: mrr_at_1
929
+ value: 31.343
930
+ - type: mrr_at_10
931
+ value: 40.549
932
+ - type: mrr_at_100
933
+ value: 41.342
934
+ - type: mrr_at_1000
935
+ value: 41.397
936
+ - type: mrr_at_3
937
+ value: 38.029
938
+ - type: mrr_at_5
939
+ value: 39.451
940
+ - type: ndcg_at_1
941
+ value: 31.343
942
+ - type: ndcg_at_10
943
+ value: 42.1
944
+ - type: ndcg_at_100
945
+ value: 47.089999999999996
946
+ - type: ndcg_at_1000
947
+ value: 49.222
948
+ - type: ndcg_at_3
949
+ value: 36.836999999999996
950
+ - type: ndcg_at_5
951
+ value: 39.21
952
+ - type: precision_at_1
953
+ value: 31.343
954
+ - type: precision_at_10
955
+ value: 7.164
956
+ - type: precision_at_100
957
+ value: 1.0959999999999999
958
+ - type: precision_at_1000
959
+ value: 0.13899999999999998
960
+ - type: precision_at_3
961
+ value: 16.915
962
+ - type: precision_at_5
963
+ value: 11.940000000000001
964
+ - type: recall_at_1
965
+ value: 26.677
966
+ - type: recall_at_10
967
+ value: 55.54599999999999
968
+ - type: recall_at_100
969
+ value: 77.094
970
+ - type: recall_at_1000
971
+ value: 92.01
972
+ - type: recall_at_3
973
+ value: 41.191
974
+ - type: recall_at_5
975
+ value: 47.006
976
+ - task:
977
+ type: Retrieval
978
+ dataset:
979
+ type: BeIR/cqadupstack
980
+ name: MTEB CQADupstackWebmastersRetrieval
981
+ config: default
982
+ split: test
983
+ revision: None
984
+ metrics:
985
+ - type: map_at_1
986
+ value: 24.501
987
+ - type: map_at_10
988
+ value: 33.102
989
+ - type: map_at_100
990
+ value: 34.676
991
+ - type: map_at_1000
992
+ value: 34.888000000000005
993
+ - type: map_at_3
994
+ value: 29.944
995
+ - type: map_at_5
996
+ value: 31.613999999999997
997
+ - type: mrr_at_1
998
+ value: 29.447000000000003
999
+ - type: mrr_at_10
1000
+ value: 37.996
1001
+ - type: mrr_at_100
1002
+ value: 38.946
1003
+ - type: mrr_at_1000
1004
+ value: 38.995000000000005
1005
+ - type: mrr_at_3
1006
+ value: 35.079
1007
+ - type: mrr_at_5
1008
+ value: 36.69
1009
+ - type: ndcg_at_1
1010
+ value: 29.447000000000003
1011
+ - type: ndcg_at_10
1012
+ value: 39.232
1013
+ - type: ndcg_at_100
1014
+ value: 45.247
1015
+ - type: ndcg_at_1000
1016
+ value: 47.613
1017
+ - type: ndcg_at_3
1018
+ value: 33.922999999999995
1019
+ - type: ndcg_at_5
1020
+ value: 36.284
1021
+ - type: precision_at_1
1022
+ value: 29.447000000000003
1023
+ - type: precision_at_10
1024
+ value: 7.648000000000001
1025
+ - type: precision_at_100
1026
+ value: 1.516
1027
+ - type: precision_at_1000
1028
+ value: 0.23900000000000002
1029
+ - type: precision_at_3
1030
+ value: 16.008
1031
+ - type: precision_at_5
1032
+ value: 11.779
1033
+ - type: recall_at_1
1034
+ value: 24.501
1035
+ - type: recall_at_10
1036
+ value: 51.18899999999999
1037
+ - type: recall_at_100
1038
+ value: 78.437
1039
+ - type: recall_at_1000
1040
+ value: 92.842
1041
+ - type: recall_at_3
1042
+ value: 35.808
1043
+ - type: recall_at_5
1044
+ value: 42.197
1045
+ - task:
1046
+ type: Retrieval
1047
+ dataset:
1048
+ type: BeIR/cqadupstack
1049
+ name: MTEB CQADupstackWordpressRetrieval
1050
+ config: default
1051
+ split: test
1052
+ revision: None
1053
+ metrics:
1054
+ - type: map_at_1
1055
+ value: 22.039
1056
+ - type: map_at_10
1057
+ value: 30.377
1058
+ - type: map_at_100
1059
+ value: 31.275
1060
+ - type: map_at_1000
1061
+ value: 31.379
1062
+ - type: map_at_3
1063
+ value: 27.98
1064
+ - type: map_at_5
1065
+ value: 29.358
1066
+ - type: mrr_at_1
1067
+ value: 24.03
1068
+ - type: mrr_at_10
1069
+ value: 32.568000000000005
1070
+ - type: mrr_at_100
1071
+ value: 33.403
1072
+ - type: mrr_at_1000
1073
+ value: 33.475
1074
+ - type: mrr_at_3
1075
+ value: 30.436999999999998
1076
+ - type: mrr_at_5
1077
+ value: 31.796000000000003
1078
+ - type: ndcg_at_1
1079
+ value: 24.03
1080
+ - type: ndcg_at_10
1081
+ value: 35.198
1082
+ - type: ndcg_at_100
1083
+ value: 39.668
1084
+ - type: ndcg_at_1000
1085
+ value: 42.296
1086
+ - type: ndcg_at_3
1087
+ value: 30.709999999999997
1088
+ - type: ndcg_at_5
1089
+ value: 33.024
1090
+ - type: precision_at_1
1091
+ value: 24.03
1092
+ - type: precision_at_10
1093
+ value: 5.564
1094
+ - type: precision_at_100
1095
+ value: 0.828
1096
+ - type: precision_at_1000
1097
+ value: 0.117
1098
+ - type: precision_at_3
1099
+ value: 13.309000000000001
1100
+ - type: precision_at_5
1101
+ value: 9.39
1102
+ - type: recall_at_1
1103
+ value: 22.039
1104
+ - type: recall_at_10
1105
+ value: 47.746
1106
+ - type: recall_at_100
1107
+ value: 68.23599999999999
1108
+ - type: recall_at_1000
1109
+ value: 87.852
1110
+ - type: recall_at_3
1111
+ value: 35.852000000000004
1112
+ - type: recall_at_5
1113
+ value: 41.410000000000004
1114
+ - task:
1115
+ type: Retrieval
1116
+ dataset:
1117
+ type: climate-fever
1118
+ name: MTEB ClimateFEVER
1119
+ config: default
1120
+ split: test
1121
+ revision: None
1122
+ metrics:
1123
+ - type: map_at_1
1124
+ value: 15.692999999999998
1125
+ - type: map_at_10
1126
+ value: 26.903
1127
+ - type: map_at_100
1128
+ value: 28.987000000000002
1129
+ - type: map_at_1000
1130
+ value: 29.176999999999996
1131
+ - type: map_at_3
1132
+ value: 22.137
1133
+ - type: map_at_5
1134
+ value: 24.758
1135
+ - type: mrr_at_1
1136
+ value: 35.57
1137
+ - type: mrr_at_10
1138
+ value: 47.821999999999996
1139
+ - type: mrr_at_100
1140
+ value: 48.608000000000004
1141
+ - type: mrr_at_1000
1142
+ value: 48.638999999999996
1143
+ - type: mrr_at_3
1144
+ value: 44.452000000000005
1145
+ - type: mrr_at_5
1146
+ value: 46.546
1147
+ - type: ndcg_at_1
1148
+ value: 35.57
1149
+ - type: ndcg_at_10
1150
+ value: 36.567
1151
+ - type: ndcg_at_100
1152
+ value: 44.085
1153
+ - type: ndcg_at_1000
1154
+ value: 47.24
1155
+ - type: ndcg_at_3
1156
+ value: 29.964000000000002
1157
+ - type: ndcg_at_5
1158
+ value: 32.511
1159
+ - type: precision_at_1
1160
+ value: 35.57
1161
+ - type: precision_at_10
1162
+ value: 11.485
1163
+ - type: precision_at_100
1164
+ value: 1.9619999999999997
1165
+ - type: precision_at_1000
1166
+ value: 0.256
1167
+ - type: precision_at_3
1168
+ value: 22.237000000000002
1169
+ - type: precision_at_5
1170
+ value: 17.471999999999998
1171
+ - type: recall_at_1
1172
+ value: 15.692999999999998
1173
+ - type: recall_at_10
1174
+ value: 43.056
1175
+ - type: recall_at_100
1176
+ value: 68.628
1177
+ - type: recall_at_1000
1178
+ value: 86.075
1179
+ - type: recall_at_3
1180
+ value: 26.918999999999997
1181
+ - type: recall_at_5
1182
+ value: 34.14
1183
+ - task:
1184
+ type: Retrieval
1185
+ dataset:
1186
+ type: dbpedia-entity
1187
+ name: MTEB DBPedia
1188
+ config: default
1189
+ split: test
1190
+ revision: None
1191
+ metrics:
1192
+ - type: map_at_1
1193
+ value: 9.53
1194
+ - type: map_at_10
1195
+ value: 20.951
1196
+ - type: map_at_100
1197
+ value: 30.136000000000003
1198
+ - type: map_at_1000
1199
+ value: 31.801000000000002
1200
+ - type: map_at_3
1201
+ value: 15.021
1202
+ - type: map_at_5
1203
+ value: 17.471999999999998
1204
+ - type: mrr_at_1
1205
+ value: 71.0
1206
+ - type: mrr_at_10
1207
+ value: 79.176
1208
+ - type: mrr_at_100
1209
+ value: 79.418
1210
+ - type: mrr_at_1000
1211
+ value: 79.426
1212
+ - type: mrr_at_3
1213
+ value: 78.125
1214
+ - type: mrr_at_5
1215
+ value: 78.61200000000001
1216
+ - type: ndcg_at_1
1217
+ value: 58.5
1218
+ - type: ndcg_at_10
1219
+ value: 44.106
1220
+ - type: ndcg_at_100
1221
+ value: 49.268
1222
+ - type: ndcg_at_1000
1223
+ value: 56.711999999999996
1224
+ - type: ndcg_at_3
1225
+ value: 48.934
1226
+ - type: ndcg_at_5
1227
+ value: 45.826
1228
+ - type: precision_at_1
1229
+ value: 71.0
1230
+ - type: precision_at_10
1231
+ value: 35.0
1232
+ - type: precision_at_100
1233
+ value: 11.360000000000001
1234
+ - type: precision_at_1000
1235
+ value: 2.046
1236
+ - type: precision_at_3
1237
+ value: 52.833
1238
+ - type: precision_at_5
1239
+ value: 44.15
1240
+ - type: recall_at_1
1241
+ value: 9.53
1242
+ - type: recall_at_10
1243
+ value: 26.811
1244
+ - type: recall_at_100
1245
+ value: 55.916999999999994
1246
+ - type: recall_at_1000
1247
+ value: 79.973
1248
+ - type: recall_at_3
1249
+ value: 16.413
1250
+ - type: recall_at_5
1251
+ value: 19.980999999999998
1252
+ - task:
1253
+ type: Classification
1254
+ dataset:
1255
+ type: mteb/emotion
1256
+ name: MTEB EmotionClassification
1257
+ config: default
1258
+ split: test
1259
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1260
+ metrics:
1261
+ - type: accuracy
1262
+ value: 51.519999999999996
1263
+ - type: f1
1264
+ value: 46.36601294761231
1265
+ - task:
1266
+ type: Retrieval
1267
+ dataset:
1268
+ type: fever
1269
+ name: MTEB FEVER
1270
+ config: default
1271
+ split: test
1272
+ revision: None
1273
+ metrics:
1274
+ - type: map_at_1
1275
+ value: 74.413
1276
+ - type: map_at_10
1277
+ value: 83.414
1278
+ - type: map_at_100
1279
+ value: 83.621
1280
+ - type: map_at_1000
1281
+ value: 83.635
1282
+ - type: map_at_3
1283
+ value: 82.337
1284
+ - type: map_at_5
1285
+ value: 83.039
1286
+ - type: mrr_at_1
1287
+ value: 80.19800000000001
1288
+ - type: mrr_at_10
1289
+ value: 87.715
1290
+ - type: mrr_at_100
1291
+ value: 87.778
1292
+ - type: mrr_at_1000
1293
+ value: 87.779
1294
+ - type: mrr_at_3
1295
+ value: 87.106
1296
+ - type: mrr_at_5
1297
+ value: 87.555
1298
+ - type: ndcg_at_1
1299
+ value: 80.19800000000001
1300
+ - type: ndcg_at_10
1301
+ value: 87.182
1302
+ - type: ndcg_at_100
1303
+ value: 87.90299999999999
1304
+ - type: ndcg_at_1000
1305
+ value: 88.143
1306
+ - type: ndcg_at_3
1307
+ value: 85.60600000000001
1308
+ - type: ndcg_at_5
1309
+ value: 86.541
1310
+ - type: precision_at_1
1311
+ value: 80.19800000000001
1312
+ - type: precision_at_10
1313
+ value: 10.531
1314
+ - type: precision_at_100
1315
+ value: 1.113
1316
+ - type: precision_at_1000
1317
+ value: 0.11499999999999999
1318
+ - type: precision_at_3
1319
+ value: 32.933
1320
+ - type: precision_at_5
1321
+ value: 20.429
1322
+ - type: recall_at_1
1323
+ value: 74.413
1324
+ - type: recall_at_10
1325
+ value: 94.363
1326
+ - type: recall_at_100
1327
+ value: 97.165
1328
+ - type: recall_at_1000
1329
+ value: 98.668
1330
+ - type: recall_at_3
1331
+ value: 90.108
1332
+ - type: recall_at_5
1333
+ value: 92.52
1334
+ - task:
1335
+ type: Retrieval
1336
+ dataset:
1337
+ type: fiqa
1338
+ name: MTEB FiQA2018
1339
+ config: default
1340
+ split: test
1341
+ revision: None
1342
+ metrics:
1343
+ - type: map_at_1
1344
+ value: 22.701
1345
+ - type: map_at_10
1346
+ value: 37.122
1347
+ - type: map_at_100
1348
+ value: 39.178000000000004
1349
+ - type: map_at_1000
1350
+ value: 39.326
1351
+ - type: map_at_3
1352
+ value: 32.971000000000004
1353
+ - type: map_at_5
1354
+ value: 35.332
1355
+ - type: mrr_at_1
1356
+ value: 44.753
1357
+ - type: mrr_at_10
1358
+ value: 53.452
1359
+ - type: mrr_at_100
1360
+ value: 54.198
1361
+ - type: mrr_at_1000
1362
+ value: 54.225
1363
+ - type: mrr_at_3
1364
+ value: 50.952
1365
+ - type: mrr_at_5
1366
+ value: 52.464
1367
+ - type: ndcg_at_1
1368
+ value: 44.753
1369
+ - type: ndcg_at_10
1370
+ value: 45.021
1371
+ - type: ndcg_at_100
1372
+ value: 52.028
1373
+ - type: ndcg_at_1000
1374
+ value: 54.596000000000004
1375
+ - type: ndcg_at_3
1376
+ value: 41.622
1377
+ - type: ndcg_at_5
1378
+ value: 42.736000000000004
1379
+ - type: precision_at_1
1380
+ value: 44.753
1381
+ - type: precision_at_10
1382
+ value: 12.284
1383
+ - type: precision_at_100
1384
+ value: 1.955
1385
+ - type: precision_at_1000
1386
+ value: 0.243
1387
+ - type: precision_at_3
1388
+ value: 27.828999999999997
1389
+ - type: precision_at_5
1390
+ value: 20.061999999999998
1391
+ - type: recall_at_1
1392
+ value: 22.701
1393
+ - type: recall_at_10
1394
+ value: 51.432
1395
+ - type: recall_at_100
1396
+ value: 77.009
1397
+ - type: recall_at_1000
1398
+ value: 92.511
1399
+ - type: recall_at_3
1400
+ value: 37.919000000000004
1401
+ - type: recall_at_5
1402
+ value: 44.131
1403
+ - task:
1404
+ type: Retrieval
1405
+ dataset:
1406
+ type: hotpotqa
1407
+ name: MTEB HotpotQA
1408
+ config: default
1409
+ split: test
1410
+ revision: None
1411
+ metrics:
1412
+ - type: map_at_1
1413
+ value: 40.189
1414
+ - type: map_at_10
1415
+ value: 66.24600000000001
1416
+ - type: map_at_100
1417
+ value: 67.098
1418
+ - type: map_at_1000
1419
+ value: 67.149
1420
+ - type: map_at_3
1421
+ value: 62.684
1422
+ - type: map_at_5
1423
+ value: 64.974
1424
+ - type: mrr_at_1
1425
+ value: 80.378
1426
+ - type: mrr_at_10
1427
+ value: 86.127
1428
+ - type: mrr_at_100
1429
+ value: 86.29299999999999
1430
+ - type: mrr_at_1000
1431
+ value: 86.297
1432
+ - type: mrr_at_3
1433
+ value: 85.31400000000001
1434
+ - type: mrr_at_5
1435
+ value: 85.858
1436
+ - type: ndcg_at_1
1437
+ value: 80.378
1438
+ - type: ndcg_at_10
1439
+ value: 74.101
1440
+ - type: ndcg_at_100
1441
+ value: 76.993
1442
+ - type: ndcg_at_1000
1443
+ value: 77.948
1444
+ - type: ndcg_at_3
1445
+ value: 69.232
1446
+ - type: ndcg_at_5
1447
+ value: 72.04599999999999
1448
+ - type: precision_at_1
1449
+ value: 80.378
1450
+ - type: precision_at_10
1451
+ value: 15.595999999999998
1452
+ - type: precision_at_100
1453
+ value: 1.7840000000000003
1454
+ - type: precision_at_1000
1455
+ value: 0.191
1456
+ - type: precision_at_3
1457
+ value: 44.884
1458
+ - type: precision_at_5
1459
+ value: 29.145
1460
+ - type: recall_at_1
1461
+ value: 40.189
1462
+ - type: recall_at_10
1463
+ value: 77.981
1464
+ - type: recall_at_100
1465
+ value: 89.21
1466
+ - type: recall_at_1000
1467
+ value: 95.48299999999999
1468
+ - type: recall_at_3
1469
+ value: 67.326
1470
+ - type: recall_at_5
1471
+ value: 72.863
1472
+ - task:
1473
+ type: Classification
1474
+ dataset:
1475
+ type: mteb/imdb
1476
+ name: MTEB ImdbClassification
1477
+ config: default
1478
+ split: test
1479
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1480
+ metrics:
1481
+ - type: accuracy
1482
+ value: 92.84599999999999
1483
+ - type: ap
1484
+ value: 89.4710787567357
1485
+ - type: f1
1486
+ value: 92.83752676932258
1487
+ - task:
1488
+ type: Retrieval
1489
+ dataset:
1490
+ type: msmarco
1491
+ name: MTEB MSMARCO
1492
+ config: default
1493
+ split: dev
1494
+ revision: None
1495
+ metrics:
1496
+ - type: map_at_1
1497
+ value: 23.132
1498
+ - type: map_at_10
1499
+ value: 35.543
1500
+ - type: map_at_100
1501
+ value: 36.702
1502
+ - type: map_at_1000
1503
+ value: 36.748999999999995
1504
+ - type: map_at_3
1505
+ value: 31.737
1506
+ - type: map_at_5
1507
+ value: 33.927
1508
+ - type: mrr_at_1
1509
+ value: 23.782
1510
+ - type: mrr_at_10
1511
+ value: 36.204
1512
+ - type: mrr_at_100
1513
+ value: 37.29
1514
+ - type: mrr_at_1000
1515
+ value: 37.330999999999996
1516
+ - type: mrr_at_3
1517
+ value: 32.458999999999996
1518
+ - type: mrr_at_5
1519
+ value: 34.631
1520
+ - type: ndcg_at_1
1521
+ value: 23.782
1522
+ - type: ndcg_at_10
1523
+ value: 42.492999999999995
1524
+ - type: ndcg_at_100
1525
+ value: 47.985
1526
+ - type: ndcg_at_1000
1527
+ value: 49.141
1528
+ - type: ndcg_at_3
1529
+ value: 34.748000000000005
1530
+ - type: ndcg_at_5
1531
+ value: 38.651
1532
+ - type: precision_at_1
1533
+ value: 23.782
1534
+ - type: precision_at_10
1535
+ value: 6.665
1536
+ - type: precision_at_100
1537
+ value: 0.941
1538
+ - type: precision_at_1000
1539
+ value: 0.104
1540
+ - type: precision_at_3
1541
+ value: 14.776
1542
+ - type: precision_at_5
1543
+ value: 10.84
1544
+ - type: recall_at_1
1545
+ value: 23.132
1546
+ - type: recall_at_10
1547
+ value: 63.794
1548
+ - type: recall_at_100
1549
+ value: 89.027
1550
+ - type: recall_at_1000
1551
+ value: 97.807
1552
+ - type: recall_at_3
1553
+ value: 42.765
1554
+ - type: recall_at_5
1555
+ value: 52.11
1556
+ - task:
1557
+ type: Classification
1558
+ dataset:
1559
+ type: mteb/mtop_domain
1560
+ name: MTEB MTOPDomainClassification (en)
1561
+ config: en
1562
+ split: test
1563
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1564
+ metrics:
1565
+ - type: accuracy
1566
+ value: 94.59188326493388
1567
+ - type: f1
1568
+ value: 94.3842594786827
1569
+ - task:
1570
+ type: Classification
1571
+ dataset:
1572
+ type: mteb/mtop_intent
1573
+ name: MTEB MTOPIntentClassification (en)
1574
+ config: en
1575
+ split: test
1576
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1577
+ metrics:
1578
+ - type: accuracy
1579
+ value: 79.49384404924761
1580
+ - type: f1
1581
+ value: 59.7580539534629
1582
+ - task:
1583
+ type: Classification
1584
+ dataset:
1585
+ type: mteb/amazon_massive_intent
1586
+ name: MTEB MassiveIntentClassification (en)
1587
+ config: en
1588
+ split: test
1589
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1590
+ metrics:
1591
+ - type: accuracy
1592
+ value: 77.56220578345663
1593
+ - type: f1
1594
+ value: 75.27228165561478
1595
+ - task:
1596
+ type: Classification
1597
+ dataset:
1598
+ type: mteb/amazon_massive_scenario
1599
+ name: MTEB MassiveScenarioClassification (en)
1600
+ config: en
1601
+ split: test
1602
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1603
+ metrics:
1604
+ - type: accuracy
1605
+ value: 80.53463349024884
1606
+ - type: f1
1607
+ value: 80.4893958236536
1608
+ - task:
1609
+ type: Clustering
1610
+ dataset:
1611
+ type: mteb/medrxiv-clustering-p2p
1612
+ name: MTEB MedrxivClusteringP2P
1613
+ config: default
1614
+ split: test
1615
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1616
+ metrics:
1617
+ - type: v_measure
1618
+ value: 32.56100273484962
1619
+ - task:
1620
+ type: Clustering
1621
+ dataset:
1622
+ type: mteb/medrxiv-clustering-s2s
1623
+ name: MTEB MedrxivClusteringS2S
1624
+ config: default
1625
+ split: test
1626
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1627
+ metrics:
1628
+ - type: v_measure
1629
+ value: 31.470380028839607
1630
+ - task:
1631
+ type: Reranking
1632
+ dataset:
1633
+ type: mteb/mind_small
1634
+ name: MTEB MindSmallReranking
1635
+ config: default
1636
+ split: test
1637
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1638
+ metrics:
1639
+ - type: map
1640
+ value: 32.06102792457849
1641
+ - type: mrr
1642
+ value: 33.30709199672238
1643
+ - task:
1644
+ type: Retrieval
1645
+ dataset:
1646
+ type: nfcorpus
1647
+ name: MTEB NFCorpus
1648
+ config: default
1649
+ split: test
1650
+ revision: None
1651
+ metrics:
1652
+ - type: map_at_1
1653
+ value: 6.776999999999999
1654
+ - type: map_at_10
1655
+ value: 14.924000000000001
1656
+ - type: map_at_100
1657
+ value: 18.955
1658
+ - type: map_at_1000
1659
+ value: 20.538999999999998
1660
+ - type: map_at_3
1661
+ value: 10.982
1662
+ - type: map_at_5
1663
+ value: 12.679000000000002
1664
+ - type: mrr_at_1
1665
+ value: 47.988
1666
+ - type: mrr_at_10
1667
+ value: 57.232000000000006
1668
+ - type: mrr_at_100
1669
+ value: 57.818999999999996
1670
+ - type: mrr_at_1000
1671
+ value: 57.847
1672
+ - type: mrr_at_3
1673
+ value: 54.901999999999994
1674
+ - type: mrr_at_5
1675
+ value: 56.481
1676
+ - type: ndcg_at_1
1677
+ value: 46.594
1678
+ - type: ndcg_at_10
1679
+ value: 38.129000000000005
1680
+ - type: ndcg_at_100
1681
+ value: 35.54
1682
+ - type: ndcg_at_1000
1683
+ value: 44.172
1684
+ - type: ndcg_at_3
1685
+ value: 43.025999999999996
1686
+ - type: ndcg_at_5
1687
+ value: 41.052
1688
+ - type: precision_at_1
1689
+ value: 47.988
1690
+ - type: precision_at_10
1691
+ value: 28.111000000000004
1692
+ - type: precision_at_100
1693
+ value: 8.929
1694
+ - type: precision_at_1000
1695
+ value: 2.185
1696
+ - type: precision_at_3
1697
+ value: 40.144000000000005
1698
+ - type: precision_at_5
1699
+ value: 35.232
1700
+ - type: recall_at_1
1701
+ value: 6.776999999999999
1702
+ - type: recall_at_10
1703
+ value: 19.289
1704
+ - type: recall_at_100
1705
+ value: 36.359
1706
+ - type: recall_at_1000
1707
+ value: 67.54
1708
+ - type: recall_at_3
1709
+ value: 11.869
1710
+ - type: recall_at_5
1711
+ value: 14.999
1712
+ - task:
1713
+ type: Retrieval
1714
+ dataset:
1715
+ type: nq
1716
+ name: MTEB NQ
1717
+ config: default
1718
+ split: test
1719
+ revision: None
1720
+ metrics:
1721
+ - type: map_at_1
1722
+ value: 31.108000000000004
1723
+ - type: map_at_10
1724
+ value: 47.126000000000005
1725
+ - type: map_at_100
1726
+ value: 48.171
1727
+ - type: map_at_1000
1728
+ value: 48.199
1729
+ - type: map_at_3
1730
+ value: 42.734
1731
+ - type: map_at_5
1732
+ value: 45.362
1733
+ - type: mrr_at_1
1734
+ value: 34.936
1735
+ - type: mrr_at_10
1736
+ value: 49.571
1737
+ - type: mrr_at_100
1738
+ value: 50.345
1739
+ - type: mrr_at_1000
1740
+ value: 50.363
1741
+ - type: mrr_at_3
1742
+ value: 45.959
1743
+ - type: mrr_at_5
1744
+ value: 48.165
1745
+ - type: ndcg_at_1
1746
+ value: 34.936
1747
+ - type: ndcg_at_10
1748
+ value: 55.028999999999996
1749
+ - type: ndcg_at_100
1750
+ value: 59.244
1751
+ - type: ndcg_at_1000
1752
+ value: 59.861
1753
+ - type: ndcg_at_3
1754
+ value: 46.872
1755
+ - type: ndcg_at_5
1756
+ value: 51.217999999999996
1757
+ - type: precision_at_1
1758
+ value: 34.936
1759
+ - type: precision_at_10
1760
+ value: 9.099
1761
+ - type: precision_at_100
1762
+ value: 1.145
1763
+ - type: precision_at_1000
1764
+ value: 0.12
1765
+ - type: precision_at_3
1766
+ value: 21.456
1767
+ - type: precision_at_5
1768
+ value: 15.411
1769
+ - type: recall_at_1
1770
+ value: 31.108000000000004
1771
+ - type: recall_at_10
1772
+ value: 76.53999999999999
1773
+ - type: recall_at_100
1774
+ value: 94.39
1775
+ - type: recall_at_1000
1776
+ value: 98.947
1777
+ - type: recall_at_3
1778
+ value: 55.572
1779
+ - type: recall_at_5
1780
+ value: 65.525
1781
+ - task:
1782
+ type: Retrieval
1783
+ dataset:
1784
+ type: quora
1785
+ name: MTEB QuoraRetrieval
1786
+ config: default
1787
+ split: test
1788
+ revision: None
1789
+ metrics:
1790
+ - type: map_at_1
1791
+ value: 71.56400000000001
1792
+ - type: map_at_10
1793
+ value: 85.482
1794
+ - type: map_at_100
1795
+ value: 86.114
1796
+ - type: map_at_1000
1797
+ value: 86.13
1798
+ - type: map_at_3
1799
+ value: 82.607
1800
+ - type: map_at_5
1801
+ value: 84.405
1802
+ - type: mrr_at_1
1803
+ value: 82.42
1804
+ - type: mrr_at_10
1805
+ value: 88.304
1806
+ - type: mrr_at_100
1807
+ value: 88.399
1808
+ - type: mrr_at_1000
1809
+ value: 88.399
1810
+ - type: mrr_at_3
1811
+ value: 87.37
1812
+ - type: mrr_at_5
1813
+ value: 88.024
1814
+ - type: ndcg_at_1
1815
+ value: 82.45
1816
+ - type: ndcg_at_10
1817
+ value: 89.06500000000001
1818
+ - type: ndcg_at_100
1819
+ value: 90.232
1820
+ - type: ndcg_at_1000
1821
+ value: 90.305
1822
+ - type: ndcg_at_3
1823
+ value: 86.375
1824
+ - type: ndcg_at_5
1825
+ value: 87.85300000000001
1826
+ - type: precision_at_1
1827
+ value: 82.45
1828
+ - type: precision_at_10
1829
+ value: 13.486999999999998
1830
+ - type: precision_at_100
1831
+ value: 1.534
1832
+ - type: precision_at_1000
1833
+ value: 0.157
1834
+ - type: precision_at_3
1835
+ value: 37.813
1836
+ - type: precision_at_5
1837
+ value: 24.773999999999997
1838
+ - type: recall_at_1
1839
+ value: 71.56400000000001
1840
+ - type: recall_at_10
1841
+ value: 95.812
1842
+ - type: recall_at_100
1843
+ value: 99.7
1844
+ - type: recall_at_1000
1845
+ value: 99.979
1846
+ - type: recall_at_3
1847
+ value: 87.966
1848
+ - type: recall_at_5
1849
+ value: 92.268
1850
+ - task:
1851
+ type: Clustering
1852
+ dataset:
1853
+ type: mteb/reddit-clustering
1854
+ name: MTEB RedditClustering
1855
+ config: default
1856
+ split: test
1857
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1858
+ metrics:
1859
+ - type: v_measure
1860
+ value: 57.241876648614145
1861
+ - task:
1862
+ type: Clustering
1863
+ dataset:
1864
+ type: mteb/reddit-clustering-p2p
1865
+ name: MTEB RedditClusteringP2P
1866
+ config: default
1867
+ split: test
1868
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1869
+ metrics:
1870
+ - type: v_measure
1871
+ value: 64.66212576446223
1872
+ - task:
1873
+ type: Retrieval
1874
+ dataset:
1875
+ type: scidocs
1876
+ name: MTEB SCIDOCS
1877
+ config: default
1878
+ split: test
1879
+ revision: None
1880
+ metrics:
1881
+ - type: map_at_1
1882
+ value: 5.308
1883
+ - type: map_at_10
1884
+ value: 13.803
1885
+ - type: map_at_100
1886
+ value: 16.176
1887
+ - type: map_at_1000
1888
+ value: 16.561
1889
+ - type: map_at_3
1890
+ value: 9.761000000000001
1891
+ - type: map_at_5
1892
+ value: 11.802
1893
+ - type: mrr_at_1
1894
+ value: 26.200000000000003
1895
+ - type: mrr_at_10
1896
+ value: 37.621
1897
+ - type: mrr_at_100
1898
+ value: 38.767
1899
+ - type: mrr_at_1000
1900
+ value: 38.815
1901
+ - type: mrr_at_3
1902
+ value: 34.117
1903
+ - type: mrr_at_5
1904
+ value: 36.107
1905
+ - type: ndcg_at_1
1906
+ value: 26.200000000000003
1907
+ - type: ndcg_at_10
1908
+ value: 22.64
1909
+ - type: ndcg_at_100
1910
+ value: 31.567
1911
+ - type: ndcg_at_1000
1912
+ value: 37.623
1913
+ - type: ndcg_at_3
1914
+ value: 21.435000000000002
1915
+ - type: ndcg_at_5
1916
+ value: 18.87
1917
+ - type: precision_at_1
1918
+ value: 26.200000000000003
1919
+ - type: precision_at_10
1920
+ value: 11.74
1921
+ - type: precision_at_100
1922
+ value: 2.465
1923
+ - type: precision_at_1000
1924
+ value: 0.391
1925
+ - type: precision_at_3
1926
+ value: 20.033
1927
+ - type: precision_at_5
1928
+ value: 16.64
1929
+ - type: recall_at_1
1930
+ value: 5.308
1931
+ - type: recall_at_10
1932
+ value: 23.794999999999998
1933
+ - type: recall_at_100
1934
+ value: 50.015
1935
+ - type: recall_at_1000
1936
+ value: 79.283
1937
+ - type: recall_at_3
1938
+ value: 12.178
1939
+ - type: recall_at_5
1940
+ value: 16.882
1941
+ - task:
1942
+ type: STS
1943
+ dataset:
1944
+ type: mteb/sickr-sts
1945
+ name: MTEB SICK-R
1946
+ config: default
1947
+ split: test
1948
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1949
+ metrics:
1950
+ - type: cos_sim_pearson
1951
+ value: 84.93231134675553
1952
+ - type: cos_sim_spearman
1953
+ value: 81.68319292603205
1954
+ - type: euclidean_pearson
1955
+ value: 81.8396814380367
1956
+ - type: euclidean_spearman
1957
+ value: 81.24641903349945
1958
+ - type: manhattan_pearson
1959
+ value: 81.84698799204274
1960
+ - type: manhattan_spearman
1961
+ value: 81.24269997904105
1962
+ - task:
1963
+ type: STS
1964
+ dataset:
1965
+ type: mteb/sts12-sts
1966
+ name: MTEB STS12
1967
+ config: default
1968
+ split: test
1969
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1970
+ metrics:
1971
+ - type: cos_sim_pearson
1972
+ value: 86.73241671587446
1973
+ - type: cos_sim_spearman
1974
+ value: 79.05091082971826
1975
+ - type: euclidean_pearson
1976
+ value: 83.91146869578044
1977
+ - type: euclidean_spearman
1978
+ value: 79.87978465370936
1979
+ - type: manhattan_pearson
1980
+ value: 83.90888338917678
1981
+ - type: manhattan_spearman
1982
+ value: 79.87482848584241
1983
+ - task:
1984
+ type: STS
1985
+ dataset:
1986
+ type: mteb/sts13-sts
1987
+ name: MTEB STS13
1988
+ config: default
1989
+ split: test
1990
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1991
+ metrics:
1992
+ - type: cos_sim_pearson
1993
+ value: 85.14970731146177
1994
+ - type: cos_sim_spearman
1995
+ value: 86.37363490084627
1996
+ - type: euclidean_pearson
1997
+ value: 83.02154218530433
1998
+ - type: euclidean_spearman
1999
+ value: 83.80258761957367
2000
+ - type: manhattan_pearson
2001
+ value: 83.01664495119347
2002
+ - type: manhattan_spearman
2003
+ value: 83.77567458007952
2004
+ - task:
2005
+ type: STS
2006
+ dataset:
2007
+ type: mteb/sts14-sts
2008
+ name: MTEB STS14
2009
+ config: default
2010
+ split: test
2011
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2012
+ metrics:
2013
+ - type: cos_sim_pearson
2014
+ value: 83.40474139886784
2015
+ - type: cos_sim_spearman
2016
+ value: 82.77768789165984
2017
+ - type: euclidean_pearson
2018
+ value: 80.7065877443695
2019
+ - type: euclidean_spearman
2020
+ value: 81.375940662505
2021
+ - type: manhattan_pearson
2022
+ value: 80.6507552270278
2023
+ - type: manhattan_spearman
2024
+ value: 81.32782179098741
2025
+ - task:
2026
+ type: STS
2027
+ dataset:
2028
+ type: mteb/sts15-sts
2029
+ name: MTEB STS15
2030
+ config: default
2031
+ split: test
2032
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2033
+ metrics:
2034
+ - type: cos_sim_pearson
2035
+ value: 87.08585968722274
2036
+ - type: cos_sim_spearman
2037
+ value: 88.03110031451399
2038
+ - type: euclidean_pearson
2039
+ value: 85.74012019602384
2040
+ - type: euclidean_spearman
2041
+ value: 86.13592849438209
2042
+ - type: manhattan_pearson
2043
+ value: 85.74404842369206
2044
+ - type: manhattan_spearman
2045
+ value: 86.14492318960154
2046
+ - task:
2047
+ type: STS
2048
+ dataset:
2049
+ type: mteb/sts16-sts
2050
+ name: MTEB STS16
2051
+ config: default
2052
+ split: test
2053
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2054
+ metrics:
2055
+ - type: cos_sim_pearson
2056
+ value: 84.95069052788875
2057
+ - type: cos_sim_spearman
2058
+ value: 86.4867991595147
2059
+ - type: euclidean_pearson
2060
+ value: 84.31013325754635
2061
+ - type: euclidean_spearman
2062
+ value: 85.01529258006482
2063
+ - type: manhattan_pearson
2064
+ value: 84.26995570085374
2065
+ - type: manhattan_spearman
2066
+ value: 84.96982104986162
2067
+ - task:
2068
+ type: STS
2069
+ dataset:
2070
+ type: mteb/sts17-crosslingual-sts
2071
+ name: MTEB STS17 (en-en)
2072
+ config: en-en
2073
+ split: test
2074
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2075
+ metrics:
2076
+ - type: cos_sim_pearson
2077
+ value: 87.54617647971897
2078
+ - type: cos_sim_spearman
2079
+ value: 87.49834181751034
2080
+ - type: euclidean_pearson
2081
+ value: 86.01015322577122
2082
+ - type: euclidean_spearman
2083
+ value: 84.63362652063199
2084
+ - type: manhattan_pearson
2085
+ value: 86.13807574475706
2086
+ - type: manhattan_spearman
2087
+ value: 84.7772370721132
2088
+ - task:
2089
+ type: STS
2090
+ dataset:
2091
+ type: mteb/sts22-crosslingual-sts
2092
+ name: MTEB STS22 (en)
2093
+ config: en
2094
+ split: test
2095
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2096
+ metrics:
2097
+ - type: cos_sim_pearson
2098
+ value: 67.20047755786615
2099
+ - type: cos_sim_spearman
2100
+ value: 67.05324077987636
2101
+ - type: euclidean_pearson
2102
+ value: 66.91930642976601
2103
+ - type: euclidean_spearman
2104
+ value: 65.21491856099105
2105
+ - type: manhattan_pearson
2106
+ value: 66.78756851976624
2107
+ - type: manhattan_spearman
2108
+ value: 65.12356257740728
2109
+ - task:
2110
+ type: STS
2111
+ dataset:
2112
+ type: mteb/stsbenchmark-sts
2113
+ name: MTEB STSBenchmark
2114
+ config: default
2115
+ split: test
2116
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2117
+ metrics:
2118
+ - type: cos_sim_pearson
2119
+ value: 86.19852871539686
2120
+ - type: cos_sim_spearman
2121
+ value: 87.5161895296395
2122
+ - type: euclidean_pearson
2123
+ value: 84.59848645207485
2124
+ - type: euclidean_spearman
2125
+ value: 85.26427328757919
2126
+ - type: manhattan_pearson
2127
+ value: 84.59747366996524
2128
+ - type: manhattan_spearman
2129
+ value: 85.24045855146915
2130
+ - task:
2131
+ type: Reranking
2132
+ dataset:
2133
+ type: mteb/scidocs-reranking
2134
+ name: MTEB SciDocsRR
2135
+ config: default
2136
+ split: test
2137
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2138
+ metrics:
2139
+ - type: map
2140
+ value: 87.63320317811032
2141
+ - type: mrr
2142
+ value: 96.26242947321379
2143
+ - task:
2144
+ type: Retrieval
2145
+ dataset:
2146
+ type: scifact
2147
+ name: MTEB SciFact
2148
+ config: default
2149
+ split: test
2150
+ revision: None
2151
+ metrics:
2152
+ - type: map_at_1
2153
+ value: 60.928000000000004
2154
+ - type: map_at_10
2155
+ value: 70.112
2156
+ - type: map_at_100
2157
+ value: 70.59299999999999
2158
+ - type: map_at_1000
2159
+ value: 70.623
2160
+ - type: map_at_3
2161
+ value: 66.846
2162
+ - type: map_at_5
2163
+ value: 68.447
2164
+ - type: mrr_at_1
2165
+ value: 64.0
2166
+ - type: mrr_at_10
2167
+ value: 71.212
2168
+ - type: mrr_at_100
2169
+ value: 71.616
2170
+ - type: mrr_at_1000
2171
+ value: 71.64500000000001
2172
+ - type: mrr_at_3
2173
+ value: 68.77799999999999
2174
+ - type: mrr_at_5
2175
+ value: 70.094
2176
+ - type: ndcg_at_1
2177
+ value: 64.0
2178
+ - type: ndcg_at_10
2179
+ value: 74.607
2180
+ - type: ndcg_at_100
2181
+ value: 76.416
2182
+ - type: ndcg_at_1000
2183
+ value: 77.102
2184
+ - type: ndcg_at_3
2185
+ value: 69.126
2186
+ - type: ndcg_at_5
2187
+ value: 71.41300000000001
2188
+ - type: precision_at_1
2189
+ value: 64.0
2190
+ - type: precision_at_10
2191
+ value: 9.933
2192
+ - type: precision_at_100
2193
+ value: 1.077
2194
+ - type: precision_at_1000
2195
+ value: 0.11299999999999999
2196
+ - type: precision_at_3
2197
+ value: 26.556
2198
+ - type: precision_at_5
2199
+ value: 17.467
2200
+ - type: recall_at_1
2201
+ value: 60.928000000000004
2202
+ - type: recall_at_10
2203
+ value: 87.322
2204
+ - type: recall_at_100
2205
+ value: 94.833
2206
+ - type: recall_at_1000
2207
+ value: 100.0
2208
+ - type: recall_at_3
2209
+ value: 72.628
2210
+ - type: recall_at_5
2211
+ value: 78.428
2212
+ - task:
2213
+ type: PairClassification
2214
+ dataset:
2215
+ type: mteb/sprintduplicatequestions-pairclassification
2216
+ name: MTEB SprintDuplicateQuestions
2217
+ config: default
2218
+ split: test
2219
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2220
+ metrics:
2221
+ - type: cos_sim_accuracy
2222
+ value: 99.86237623762376
2223
+ - type: cos_sim_ap
2224
+ value: 96.72586477206649
2225
+ - type: cos_sim_f1
2226
+ value: 93.01858362631845
2227
+ - type: cos_sim_precision
2228
+ value: 93.4409687184662
2229
+ - type: cos_sim_recall
2230
+ value: 92.60000000000001
2231
+ - type: dot_accuracy
2232
+ value: 99.78019801980199
2233
+ - type: dot_ap
2234
+ value: 93.72748205246228
2235
+ - type: dot_f1
2236
+ value: 89.04109589041096
2237
+ - type: dot_precision
2238
+ value: 87.16475095785441
2239
+ - type: dot_recall
2240
+ value: 91.0
2241
+ - type: euclidean_accuracy
2242
+ value: 99.85445544554456
2243
+ - type: euclidean_ap
2244
+ value: 96.6661459876145
2245
+ - type: euclidean_f1
2246
+ value: 92.58337481333997
2247
+ - type: euclidean_precision
2248
+ value: 92.17046580773042
2249
+ - type: euclidean_recall
2250
+ value: 93.0
2251
+ - type: manhattan_accuracy
2252
+ value: 99.85445544554456
2253
+ - type: manhattan_ap
2254
+ value: 96.6883549244056
2255
+ - type: manhattan_f1
2256
+ value: 92.57598405580468
2257
+ - type: manhattan_precision
2258
+ value: 92.25422045680239
2259
+ - type: manhattan_recall
2260
+ value: 92.9
2261
+ - type: max_accuracy
2262
+ value: 99.86237623762376
2263
+ - type: max_ap
2264
+ value: 96.72586477206649
2265
+ - type: max_f1
2266
+ value: 93.01858362631845
2267
+ - task:
2268
+ type: Clustering
2269
+ dataset:
2270
+ type: mteb/stackexchange-clustering
2271
+ name: MTEB StackExchangeClustering
2272
+ config: default
2273
+ split: test
2274
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2275
+ metrics:
2276
+ - type: v_measure
2277
+ value: 66.39930057069995
2278
+ - task:
2279
+ type: Clustering
2280
+ dataset:
2281
+ type: mteb/stackexchange-clustering-p2p
2282
+ name: MTEB StackExchangeClusteringP2P
2283
+ config: default
2284
+ split: test
2285
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2286
+ metrics:
2287
+ - type: v_measure
2288
+ value: 34.96398659903402
2289
+ - task:
2290
+ type: Reranking
2291
+ dataset:
2292
+ type: mteb/stackoverflowdupquestions-reranking
2293
+ name: MTEB StackOverflowDupQuestions
2294
+ config: default
2295
+ split: test
2296
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2297
+ metrics:
2298
+ - type: map
2299
+ value: 55.946944700355395
2300
+ - type: mrr
2301
+ value: 56.97151398438164
2302
+ - task:
2303
+ type: Summarization
2304
+ dataset:
2305
+ type: mteb/summeval
2306
+ name: MTEB SummEval
2307
+ config: default
2308
+ split: test
2309
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2310
+ metrics:
2311
+ - type: cos_sim_pearson
2312
+ value: 31.541657650692905
2313
+ - type: cos_sim_spearman
2314
+ value: 31.605804192286303
2315
+ - type: dot_pearson
2316
+ value: 28.26905996736398
2317
+ - type: dot_spearman
2318
+ value: 27.864801765851187
2319
+ - task:
2320
+ type: Retrieval
2321
+ dataset:
2322
+ type: trec-covid
2323
+ name: MTEB TRECCOVID
2324
+ config: default
2325
+ split: test
2326
+ revision: None
2327
+ metrics:
2328
+ - type: map_at_1
2329
+ value: 0.22599999999999998
2330
+ - type: map_at_10
2331
+ value: 1.8870000000000002
2332
+ - type: map_at_100
2333
+ value: 9.78
2334
+ - type: map_at_1000
2335
+ value: 22.514
2336
+ - type: map_at_3
2337
+ value: 0.6669999999999999
2338
+ - type: map_at_5
2339
+ value: 1.077
2340
+ - type: mrr_at_1
2341
+ value: 82.0
2342
+ - type: mrr_at_10
2343
+ value: 89.86699999999999
2344
+ - type: mrr_at_100
2345
+ value: 89.86699999999999
2346
+ - type: mrr_at_1000
2347
+ value: 89.86699999999999
2348
+ - type: mrr_at_3
2349
+ value: 89.667
2350
+ - type: mrr_at_5
2351
+ value: 89.667
2352
+ - type: ndcg_at_1
2353
+ value: 79.0
2354
+ - type: ndcg_at_10
2355
+ value: 74.818
2356
+ - type: ndcg_at_100
2357
+ value: 53.715999999999994
2358
+ - type: ndcg_at_1000
2359
+ value: 47.082
2360
+ - type: ndcg_at_3
2361
+ value: 82.134
2362
+ - type: ndcg_at_5
2363
+ value: 79.81899999999999
2364
+ - type: precision_at_1
2365
+ value: 82.0
2366
+ - type: precision_at_10
2367
+ value: 78.0
2368
+ - type: precision_at_100
2369
+ value: 54.48
2370
+ - type: precision_at_1000
2371
+ value: 20.518
2372
+ - type: precision_at_3
2373
+ value: 87.333
2374
+ - type: precision_at_5
2375
+ value: 85.2
2376
+ - type: recall_at_1
2377
+ value: 0.22599999999999998
2378
+ - type: recall_at_10
2379
+ value: 2.072
2380
+ - type: recall_at_100
2381
+ value: 13.013
2382
+ - type: recall_at_1000
2383
+ value: 43.462
2384
+ - type: recall_at_3
2385
+ value: 0.695
2386
+ - type: recall_at_5
2387
+ value: 1.139
2388
+ - task:
2389
+ type: Retrieval
2390
+ dataset:
2391
+ type: webis-touche2020
2392
+ name: MTEB Touche2020
2393
+ config: default
2394
+ split: test
2395
+ revision: None
2396
+ metrics:
2397
+ - type: map_at_1
2398
+ value: 2.328
2399
+ - type: map_at_10
2400
+ value: 9.795
2401
+ - type: map_at_100
2402
+ value: 15.801000000000002
2403
+ - type: map_at_1000
2404
+ value: 17.23
2405
+ - type: map_at_3
2406
+ value: 4.734
2407
+ - type: map_at_5
2408
+ value: 6.644
2409
+ - type: mrr_at_1
2410
+ value: 30.612000000000002
2411
+ - type: mrr_at_10
2412
+ value: 46.902
2413
+ - type: mrr_at_100
2414
+ value: 47.495
2415
+ - type: mrr_at_1000
2416
+ value: 47.495
2417
+ - type: mrr_at_3
2418
+ value: 41.156
2419
+ - type: mrr_at_5
2420
+ value: 44.218
2421
+ - type: ndcg_at_1
2422
+ value: 28.571
2423
+ - type: ndcg_at_10
2424
+ value: 24.806
2425
+ - type: ndcg_at_100
2426
+ value: 36.419000000000004
2427
+ - type: ndcg_at_1000
2428
+ value: 47.272999999999996
2429
+ - type: ndcg_at_3
2430
+ value: 25.666
2431
+ - type: ndcg_at_5
2432
+ value: 25.448999999999998
2433
+ - type: precision_at_1
2434
+ value: 30.612000000000002
2435
+ - type: precision_at_10
2436
+ value: 23.061
2437
+ - type: precision_at_100
2438
+ value: 7.714
2439
+ - type: precision_at_1000
2440
+ value: 1.484
2441
+ - type: precision_at_3
2442
+ value: 26.531
2443
+ - type: precision_at_5
2444
+ value: 26.122
2445
+ - type: recall_at_1
2446
+ value: 2.328
2447
+ - type: recall_at_10
2448
+ value: 16.524
2449
+ - type: recall_at_100
2450
+ value: 47.179
2451
+ - type: recall_at_1000
2452
+ value: 81.22200000000001
2453
+ - type: recall_at_3
2454
+ value: 5.745
2455
+ - type: recall_at_5
2456
+ value: 9.339
2457
+ - task:
2458
+ type: Classification
2459
+ dataset:
2460
+ type: mteb/toxic_conversations_50k
2461
+ name: MTEB ToxicConversationsClassification
2462
+ config: default
2463
+ split: test
2464
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2465
+ metrics:
2466
+ - type: accuracy
2467
+ value: 70.9142
2468
+ - type: ap
2469
+ value: 14.335574772555415
2470
+ - type: f1
2471
+ value: 54.62839595194111
2472
+ - task:
2473
+ type: Classification
2474
+ dataset:
2475
+ type: mteb/tweet_sentiment_extraction
2476
+ name: MTEB TweetSentimentExtractionClassification
2477
+ config: default
2478
+ split: test
2479
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2480
+ metrics:
2481
+ - type: accuracy
2482
+ value: 59.94340690435768
2483
+ - type: f1
2484
+ value: 60.286487936731916
2485
+ - task:
2486
+ type: Clustering
2487
+ dataset:
2488
+ type: mteb/twentynewsgroups-clustering
2489
+ name: MTEB TwentyNewsgroupsClustering
2490
+ config: default
2491
+ split: test
2492
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2493
+ metrics:
2494
+ - type: v_measure
2495
+ value: 51.26597708987974
2496
+ - task:
2497
+ type: PairClassification
2498
+ dataset:
2499
+ type: mteb/twittersemeval2015-pairclassification
2500
+ name: MTEB TwitterSemEval2015
2501
+ config: default
2502
+ split: test
2503
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2504
+ metrics:
2505
+ - type: cos_sim_accuracy
2506
+ value: 87.48882398521786
2507
+ - type: cos_sim_ap
2508
+ value: 79.04326607602204
2509
+ - type: cos_sim_f1
2510
+ value: 71.64566826860633
2511
+ - type: cos_sim_precision
2512
+ value: 70.55512918905092
2513
+ - type: cos_sim_recall
2514
+ value: 72.77044854881267
2515
+ - type: dot_accuracy
2516
+ value: 84.19264469213805
2517
+ - type: dot_ap
2518
+ value: 67.96360043562528
2519
+ - type: dot_f1
2520
+ value: 64.06418393006827
2521
+ - type: dot_precision
2522
+ value: 58.64941898706424
2523
+ - type: dot_recall
2524
+ value: 70.58047493403694
2525
+ - type: euclidean_accuracy
2526
+ value: 87.45902127913214
2527
+ - type: euclidean_ap
2528
+ value: 78.9742237648272
2529
+ - type: euclidean_f1
2530
+ value: 71.5553235908142
2531
+ - type: euclidean_precision
2532
+ value: 70.77955601445535
2533
+ - type: euclidean_recall
2534
+ value: 72.34828496042216
2535
+ - type: manhattan_accuracy
2536
+ value: 87.41729749061214
2537
+ - type: manhattan_ap
2538
+ value: 78.90073137580596
2539
+ - type: manhattan_f1
2540
+ value: 71.3942611553533
2541
+ - type: manhattan_precision
2542
+ value: 68.52705653967483
2543
+ - type: manhattan_recall
2544
+ value: 74.51187335092348
2545
+ - type: max_accuracy
2546
+ value: 87.48882398521786
2547
+ - type: max_ap
2548
+ value: 79.04326607602204
2549
+ - type: max_f1
2550
+ value: 71.64566826860633
2551
+ - task:
2552
+ type: PairClassification
2553
+ dataset:
2554
+ type: mteb/twitterurlcorpus-pairclassification
2555
+ name: MTEB TwitterURLCorpus
2556
+ config: default
2557
+ split: test
2558
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2559
+ metrics:
2560
+ - type: cos_sim_accuracy
2561
+ value: 88.68125897465751
2562
+ - type: cos_sim_ap
2563
+ value: 85.6003454431979
2564
+ - type: cos_sim_f1
2565
+ value: 77.6957163958641
2566
+ - type: cos_sim_precision
2567
+ value: 73.0110366307807
2568
+ - type: cos_sim_recall
2569
+ value: 83.02279026793964
2570
+ - type: dot_accuracy
2571
+ value: 87.7672992587418
2572
+ - type: dot_ap
2573
+ value: 82.4971301112899
2574
+ - type: dot_f1
2575
+ value: 75.90528233151184
2576
+ - type: dot_precision
2577
+ value: 72.0370626469368
2578
+ - type: dot_recall
2579
+ value: 80.21250384970742
2580
+ - type: euclidean_accuracy
2581
+ value: 88.4503434625684
2582
+ - type: euclidean_ap
2583
+ value: 84.91949884748384
2584
+ - type: euclidean_f1
2585
+ value: 76.92365018444684
2586
+ - type: euclidean_precision
2587
+ value: 74.53245721712759
2588
+ - type: euclidean_recall
2589
+ value: 79.47336002463813
2590
+ - type: manhattan_accuracy
2591
+ value: 88.47556952691427
2592
+ - type: manhattan_ap
2593
+ value: 84.8963689101517
2594
+ - type: manhattan_f1
2595
+ value: 76.85901249256395
2596
+ - type: manhattan_precision
2597
+ value: 74.31693989071039
2598
+ - type: manhattan_recall
2599
+ value: 79.58115183246073
2600
+ - type: max_accuracy
2601
+ value: 88.68125897465751
2602
+ - type: max_ap
2603
+ value: 85.6003454431979
2604
+ - type: max_f1
2605
+ value: 77.6957163958641
2606
+ license: mit
2607
+ language:
2608
+ - en
2609
+ ---
2610
+ # # Fast-Inference with Ctranslate2
2611
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
2612
+
2613
+ quantized version of [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)
2614
+ ```bash
2615
+ pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1
2616
+ ```
2617
+
2618
+ ```python
2619
+ # from transformers import AutoTokenizer
2620
+ model_name = "michaelfeil/ct2fast-bge-large-en-v1.5"
2621
+ model_name_orig="BAAI/bge-large-en-v1.5"
2622
+
2623
+ from hf_hub_ctranslate2 import EncoderCT2fromHfHub
2624
+ model = EncoderCT2fromHfHub(
2625
+ # load in int8 on CUDA
2626
+ model_name_or_path=model_name,
2627
+ device="cuda",
2628
+ compute_type="int8_float16"
2629
+ )
2630
+ outputs = model.generate(
2631
+ text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2632
+ max_length=64,
2633
+ ) # perform downstream tasks on outputs
2634
+ outputs["pooler_output"]
2635
+ outputs["last_hidden_state"]
2636
+ outputs["attention_mask"]
2637
+
2638
+ # alternative, use SentenceTransformer Mix-In
2639
+ # for end-to-end Sentence embeddings generation
2640
+ # (not pulling from this CT2fast-HF repo)
2641
+
2642
+ from hf_hub_ctranslate2 import CT2SentenceTransformer
2643
+ model = CT2SentenceTransformer(
2644
+ model_name_orig, compute_type="int8_float16", device="cuda"
2645
+ )
2646
+ embeddings = model.encode(
2647
+ ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2648
+ batch_size=32,
2649
+ convert_to_numpy=True,
2650
+ normalize_embeddings=True,
2651
+ )
2652
+ print(embeddings.shape, embeddings)
2653
+ scores = (embeddings @ embeddings.T) * 100
2654
+
2655
+ # Hint: you can also host this code via REST API and
2656
+ # via github.com/michaelfeil/infinity
2657
+
2658
+
2659
+ ```
2660
+
2661
+ Checkpoint compatible to [ctranslate2>=3.17.1](https://github.com/OpenNMT/CTranslate2)
2662
+ and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
2663
+ - `compute_type=int8_float16` for `device="cuda"`
2664
+ - `compute_type=int8` for `device="cpu"`
2665
+
2666
+ Converted on 2023-10-13 using
2667
+ ```
2668
+ LLama-2 -> removed <pad> token.
2669
+ ```
2670
+
2671
+ # Licence and other remarks:
2672
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
2673
+
2674
+ # Original description
2675
+
2676
+
2677
+
2678
+ <h1 align="center">FlagEmbedding</h1>
2679
+
2680
+
2681
+ <h4 align="center">
2682
+ <p>
2683
+ <a href=#model-list>Model List</a> |
2684
+ <a href=#frequently-asked-questions>FAQ</a> |
2685
+ <a href=#usage>Usage</a> |
2686
+ <a href="#evaluation">Evaluation</a> |
2687
+ <a href="#train">Train</a> |
2688
+ <a href="#contact">Contact</a> |
2689
+ <a href="#citation">Citation</a> |
2690
+ <a href="#license">License</a>
2691
+ <p>
2692
+ </h4>
2693
+
2694
+ More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding).
2695
+
2696
+
2697
+ [English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md)
2698
+
2699
+ FlagEmbedding can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
2700
+ And it also can be used in vector databases for LLMs.
2701
+
2702
+ ************* 🌟**Updates**🌟 *************
2703
+ - 10/12/2023: Release [LLM-Embedder](./FlagEmbedding/llm_embedder/README.md), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Paper](https://arxiv.org/pdf/2310.07554.pdf) :fire:
2704
+ - 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) of BGE has been released
2705
+ - 09/15/2023: The [masive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released
2706
+ - 09/12/2023: New models:
2707
+ - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
2708
+ - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
2709
+
2710
+
2711
+ <details>
2712
+ <summary>More</summary>
2713
+ <!-- ### More -->
2714
+
2715
+ - 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
2716
+ - 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
2717
+ - 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
2718
+ - 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada:
2719
+ - 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset.
2720
+
2721
+ </details>
2722
+
2723
+
2724
+ ## Model List
2725
+
2726
+ `bge` is short for `BAAI general embedding`.
2727
+
2728
+ | Model | Language | | Description | query instruction for retrieval [1] |
2729
+ |:-------------------------------|:--------:| :--------:| :--------:|:--------:|
2730
+ | [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) |
2731
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2732
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | |
2733
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2734
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2735
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` |
2736
+ | [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2737
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2738
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` |
2739
+ | [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` |
2740
+ | [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` |
2741
+ | [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) |a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` |
2742
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` |
2743
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` |
2744
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` |
2745
+
2746
+
2747
+ [1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages.
2748
+
2749
+ [2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models.
2750
+ For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results.
2751
+
2752
+ All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI.
2753
+ If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models .
2754
+
2755
+
2756
+ ## Frequently asked questions
2757
+
2758
+ <details>
2759
+ <summary>1. How to fine-tune bge embedding model?</summary>
2760
+
2761
+ <!-- ### How to fine-tune bge embedding model? -->
2762
+ Following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to prepare data and fine-tune your model.
2763
+ Some suggestions:
2764
+ - Mine hard negatives following this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives), which can improve the retrieval performance.
2765
+ - If you pre-train bge on your data, the pre-trained model cannot be directly used to calculate similarity, and it must be fine-tuned with contrastive learning before computing similarity.
2766
+ - If the accuracy of the fine-tuned model is still not high, it is recommended to use/fine-tune the cross-encoder model (bge-reranker) to re-rank top-k results. Hard negatives also are needed to fine-tune reranker.
2767
+
2768
+
2769
+ </details>
2770
+
2771
+ <details>
2772
+ <summary>2. The similarity score between two dissimilar sentences is higher than 0.5</summary>
2773
+
2774
+ <!-- ### The similarity score between two dissimilar sentences is higher than 0.5 -->
2775
+ **Suggest to use bge v1.5, which alleviates the issue of the similarity distribution.**
2776
+
2777
+ Since we finetune the models by contrastive learning with a temperature of 0.01,
2778
+ the similarity distribution of the current BGE model is about in the interval \[0.6, 1\].
2779
+ So a similarity score greater than 0.5 does not indicate that the two sentences are similar.
2780
+
2781
+ For downstream tasks, such as passage retrieval or semantic similarity,
2782
+ **what matters is the relative order of the scores, not the absolute value.**
2783
+ If you need to filter similar sentences based on a similarity threshold,
2784
+ please select an appropriate similarity threshold based on the similarity distribution on your data (such as 0.8, 0.85, or even 0.9).
2785
+
2786
+ </details>
2787
+
2788
+ <details>
2789
+ <summary>3. When does the query instruction need to be used</summary>
2790
+
2791
+ <!-- ### When does the query instruction need to be used -->
2792
+
2793
+ For the `bge-*-v1.5`, we improve its retrieval ability when not using instruction.
2794
+ No instruction only has a slight degradation in retrieval performance compared with using instruction.
2795
+ So you can generate embedding without instruction in all cases for convenience.
2796
+
2797
+ For a retrieval task that uses short queries to find long related documents,
2798
+ it is recommended to add instructions for these short queries.
2799
+ **The best method to decide whether to add instructions for queries is choosing the setting that achieves better performance on your task.**
2800
+ In all cases, the documents/passages do not need to add the instruction.
2801
+
2802
+ </details>
2803
+
2804
+
2805
+ ## Usage
2806
+
2807
+ ### Usage for Embedding Model
2808
+
2809
+ Here are some examples for using `bge` models with
2810
+ [FlagEmbedding](#using-flagembedding), [Sentence-Transformers](#using-sentence-transformers), [Langchain](#using-langchain), or [Huggingface Transformers](#using-huggingface-transformers).
2811
+
2812
+ #### Using FlagEmbedding
2813
+ ```
2814
+ pip install -U FlagEmbedding
2815
+ ```
2816
+ If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding.
2817
+
2818
+ ```python
2819
+ from FlagEmbedding import FlagModel
2820
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2821
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2822
+ model = FlagModel('BAAI/bge-large-zh-v1.5',
2823
+ query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章:",
2824
+ use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2825
+ embeddings_1 = model.encode(sentences_1)
2826
+ embeddings_2 = model.encode(sentences_2)
2827
+ similarity = embeddings_1 @ embeddings_2.T
2828
+ print(similarity)
2829
+
2830
+ # for s2p(short query to long passage) retrieval task, suggest to use encode_queries() which will automatically add the instruction to each query
2831
+ # corpus in retrieval task can still use encode() or encode_corpus(), since they don't need instruction
2832
+ queries = ['query_1', 'query_2']
2833
+ passages = ["样例文档-1", "样例文档-2"]
2834
+ q_embeddings = model.encode_queries(queries)
2835
+ p_embeddings = model.encode(passages)
2836
+ scores = q_embeddings @ p_embeddings.T
2837
+ ```
2838
+ For the value of the argument `query_instruction_for_retrieval`, see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list).
2839
+
2840
+ By default, FlagModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs.
2841
+ You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable.
2842
+
2843
+
2844
+ #### Using Sentence-Transformers
2845
+
2846
+ You can also use the `bge` models with [sentence-transformers](https://www.SBERT.net):
2847
+
2848
+ ```
2849
+ pip install -U sentence-transformers
2850
+ ```
2851
+ ```python
2852
+ from sentence_transformers import SentenceTransformer
2853
+ sentences_1 = ["样例数据-1", "样例数据-2"]
2854
+ sentences_2 = ["样例数据-3", "样例数据-4"]
2855
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2856
+ embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
2857
+ embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
2858
+ similarity = embeddings_1 @ embeddings_2.T
2859
+ print(similarity)
2860
+ ```
2861
+ For s2p(short query to long passage) retrieval task,
2862
+ each short query should start with an instruction (instructions see [Model List](https://github.com/FlagOpen/FlagEmbedding/tree/master#model-list)).
2863
+ But the instruction is not needed for passages.
2864
+ ```python
2865
+ from sentence_transformers import SentenceTransformer
2866
+ queries = ['query_1', 'query_2']
2867
+ passages = ["样例文档-1", "样例文档-2"]
2868
+ instruction = "为这个句子生成表示以用于检索相关文章:"
2869
+
2870
+ model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
2871
+ q_embeddings = model.encode([instruction+q for q in queries], normalize_embeddings=True)
2872
+ p_embeddings = model.encode(passages, normalize_embeddings=True)
2873
+ scores = q_embeddings @ p_embeddings.T
2874
+ ```
2875
+
2876
+ #### Using Langchain
2877
+
2878
+ You can use `bge` in langchain like this:
2879
+ ```python
2880
+ from langchain.embeddings import HuggingFaceBgeEmbeddings
2881
+ model_name = "BAAI/bge-large-en-v1.5"
2882
+ model_kwargs = {'device': 'cuda'}
2883
+ encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
2884
+ model = HuggingFaceBgeEmbeddings(
2885
+ model_name=model_name,
2886
+ model_kwargs=model_kwargs,
2887
+ encode_kwargs=encode_kwargs,
2888
+ query_instruction="为这个句子生成表示以用于检索相关文章:"
2889
+ )
2890
+ model.query_instruction = "为这个句子生成表示以用于检索相关文章:"
2891
+ ```
2892
+
2893
+
2894
+ #### Using HuggingFace Transformers
2895
+
2896
+ With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding.
2897
+
2898
+ ```python
2899
+ from transformers import AutoTokenizer, AutoModel
2900
+ import torch
2901
+ # Sentences we want sentence embeddings for
2902
+ sentences = ["样例数据-1", "样例数据-2"]
2903
+
2904
+ # Load model from HuggingFace Hub
2905
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')
2906
+ model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')
2907
+ model.eval()
2908
+
2909
+ # Tokenize sentences
2910
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2911
+ # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
2912
+ # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
2913
+
2914
+ # Compute token embeddings
2915
+ with torch.no_grad():
2916
+ model_output = model(**encoded_input)
2917
+ # Perform pooling. In this case, cls pooling.
2918
+ sentence_embeddings = model_output[0][:, 0]
2919
+ # normalize embeddings
2920
+ sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
2921
+ print("Sentence embeddings:", sentence_embeddings)
2922
+ ```
2923
+
2924
+ ### Usage for Reranker
2925
+
2926
+ Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
2927
+ You can get a relevance score by inputting query and passage to the reranker.
2928
+ The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
2929
+
2930
+
2931
+ #### Using FlagEmbedding
2932
+ ```
2933
+ pip install -U FlagEmbedding
2934
+ ```
2935
+
2936
+ Get relevance scores (higher scores indicate more relevance):
2937
+ ```python
2938
+ from FlagEmbedding import FlagReranker
2939
+ reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
2940
+
2941
+ score = reranker.compute_score(['query', 'passage'])
2942
+ print(score)
2943
+
2944
+ scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
2945
+ print(scores)
2946
+ ```
2947
+
2948
+
2949
+ #### Using Huggingface transformers
2950
+
2951
+ ```python
2952
+ import torch
2953
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
2954
+
2955
+ tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large')
2956
+ model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large')
2957
+ model.eval()
2958
+
2959
+ pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
2960
+ with torch.no_grad():
2961
+ inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
2962
+ scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
2963
+ print(scores)
2964
+ ```
2965
+
2966
+ ## Evaluation
2967
+
2968
+ `baai-general-embedding` models achieve **state-of-the-art performance on both MTEB and C-MTEB leaderboard!**
2969
+ For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md).
2970
+
2971
+ - **MTEB**:
2972
+
2973
+ | Model Name | Dimension | Sequence Length | Average (56) | Retrieval (15) |Clustering (11) | Pair Classification (3) | Reranking (4) | STS (10) | Summarization (1) | Classification (12) |
2974
+ |:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
2975
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | **64.23** | **54.29** | 46.08 | 87.12 | 60.03 | 83.11 | 31.61 | 75.97 |
2976
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 | 63.55 | 53.25 | 45.77 | 86.55 | 58.86 | 82.4 | 31.07 | 75.53 |
2977
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 | 62.17 |51.68 | 43.82 | 84.92 | 58.36 | 81.59 | 30.12 | 74.14 |
2978
+ | [bge-large-en](https://huggingface.co/BAAI/bge-large-en) | 1024 | 512 | 63.98 | 53.9 | 46.98 | 85.8 | 59.48 | 81.56 | 32.06 | 76.21 |
2979
+ | [bge-base-en](https://huggingface.co/BAAI/bge-base-en) | 768 | 512 | 63.36 | 53.0 | 46.32 | 85.86 | 58.7 | 81.84 | 29.27 | 75.27 |
2980
+ | [gte-large](https://huggingface.co/thenlper/gte-large) | 1024 | 512 | 63.13 | 52.22 | 46.84 | 85.00 | 59.13 | 83.35 | 31.66 | 73.33 |
2981
+ | [gte-base](https://huggingface.co/thenlper/gte-base) | 768 | 512 | 62.39 | 51.14 | 46.2 | 84.57 | 58.61 | 82.3 | 31.17 | 73.01 |
2982
+ | [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | 1024| 512 | 62.25 | 50.56 | 44.49 | 86.03 | 56.61 | 82.05 | 30.19 | 75.24 |
2983
+ | [bge-small-en](https://huggingface.co/BAAI/bge-small-en) | 384 | 512 | 62.11 | 51.82 | 44.31 | 83.78 | 57.97 | 80.72 | 30.53 | 74.37 |
2984
+ | [instructor-xl](https://huggingface.co/hkunlp/instructor-xl) | 768 | 512 | 61.79 | 49.26 | 44.74 | 86.62 | 57.29 | 83.06 | 32.32 | 61.79 |
2985
+ | [e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) | 768 | 512 | 61.5 | 50.29 | 43.80 | 85.73 | 55.91 | 81.05 | 30.28 | 73.84 |
2986
+ | [gte-small](https://huggingface.co/thenlper/gte-small) | 384 | 512 | 61.36 | 49.46 | 44.89 | 83.54 | 57.7 | 82.07 | 30.42 | 72.31 |
2987
+ | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 | 60.99 | 49.25 | 45.9 | 84.89 | 56.32 | 80.97 | 30.8 | 70.93 |
2988
+ | [e5-small-v2](https://huggingface.co/intfloat/e5-base-v2) | 384 | 512 | 59.93 | 49.04 | 39.92 | 84.67 | 54.32 | 80.39 | 31.16 | 72.94 |
2989
+ | [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl) | 768 | 512 | 59.51 | 42.24 | 43.72 | 85.06 | 56.42 | 82.63 | 30.08 | 73.42 |
2990
+ | [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 768 | 514 | 57.78 | 43.81 | 43.69 | 83.04 | 59.36 | 80.28 | 27.49 | 65.07 |
2991
+ | [sgpt-bloom-7b1-msmarco](https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco) | 4096 | 2048 | 57.59 | 48.22 | 38.93 | 81.9 | 55.65 | 77.74 | 33.6 | 66.19 |
2992
+
2993
+
2994
+
2995
+ - **C-MTEB**:
2996
+ We create the benchmark C-MTEB for Chinese text embedding which consists of 31 datasets from 6 tasks.
2997
+ Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md) for a detailed introduction.
2998
+
2999
+ | Model | Embedding dimension | Avg | Retrieval | STS | PairClassification | Classification | Reranking | Clustering |
3000
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3001
+ | [**BAAI/bge-large-zh-v1.5**](https://huggingface.co/BAAI/bge-large-zh-v1.5) | 1024 | **64.53** | 70.46 | 56.25 | 81.6 | 69.13 | 65.84 | 48.99 |
3002
+ | [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | 768 | 63.13 | 69.49 | 53.72 | 79.75 | 68.07 | 65.39 | 47.53 |
3003
+ | [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | 512 | 57.82 | 61.77 | 49.11 | 70.41 | 63.96 | 60.92 | 44.18 |
3004
+ | [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | 1024 | 64.20 | 71.53 | 54.98 | 78.94 | 68.32 | 65.11 | 48.39 |
3005
+ | [bge-large-zh-noinstruct](https://huggingface.co/BAAI/bge-large-zh-noinstruct) | 1024 | 63.53 | 70.55 | 53 | 76.77 | 68.58 | 64.91 | 50.01 |
3006
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 768 | 62.96 | 69.53 | 54.12 | 77.5 | 67.07 | 64.91 | 47.63 |
3007
+ | [multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | 1024 | 58.79 | 63.66 | 48.44 | 69.89 | 67.34 | 56.00 | 48.23 |
3008
+ | [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | 512 | 58.27 | 63.07 | 49.45 | 70.35 | 63.64 | 61.48 | 45.09 |
3009
+ | [m3e-base](https://huggingface.co/moka-ai/m3e-base) | 768 | 57.10 | 56.91 | 50.47 | 63.99 | 67.52 | 59.34 | 47.68 |
3010
+ | [m3e-large](https://huggingface.co/moka-ai/m3e-large) | 1024 | 57.05 | 54.75 | 50.42 | 64.3 | 68.2 | 59.66 | 48.88 |
3011
+ | [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) | 768 | 55.48 | 61.63 | 46.49 | 67.07 | 65.35 | 54.35 | 40.68 |
3012
+ | [multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) | 384 | 55.38 | 59.95 | 45.27 | 66.45 | 65.85 | 53.86 | 45.26 |
3013
+ | [text-embedding-ada-002(OpenAI)](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings) | 1536 | 53.02 | 52.0 | 43.35 | 69.56 | 64.31 | 54.28 | 45.68 |
3014
+ | [luotuo](https://huggingface.co/silk-road/luotuo-bert-medium) | 1024 | 49.37 | 44.4 | 42.78 | 66.62 | 61 | 49.25 | 44.39 |
3015
+ | [text2vec-base](https://huggingface.co/shibing624/text2vec-base-chinese) | 768 | 47.63 | 38.79 | 43.41 | 67.41 | 62.19 | 49.45 | 37.66 |
3016
+ | [text2vec-large](https://huggingface.co/GanymedeNil/text2vec-large-chinese) | 1024 | 47.36 | 41.94 | 44.97 | 70.86 | 60.66 | 49.16 | 30.02 |
3017
+
3018
+
3019
+ - **Reranking**:
3020
+ See [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/) for evaluation script.
3021
+
3022
+ | Model | T2Reranking | T2RerankingZh2En\* | T2RerankingEn2Zh\* | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
3023
+ |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
3024
+ | text2vec-base-multilingual | 64.66 | 62.94 | 62.51 | 14.37 | 48.46 | 48.6 | 50.26 |
3025
+ | multilingual-e5-small | 65.62 | 60.94 | 56.41 | 29.91 | 67.26 | 66.54 | 57.78 |
3026
+ | multilingual-e5-large | 64.55 | 61.61 | 54.28 | 28.6 | 67.42 | 67.92 | 57.4 |
3027
+ | multilingual-e5-base | 64.21 | 62.13 | 54.68 | 29.5 | 66.23 | 66.98 | 57.29 |
3028
+ | m3e-base | 66.03 | 62.74 | 56.07 | 17.51 | 77.05 | 76.76 | 59.36 |
3029
+ | m3e-large | 66.13 | 62.72 | 56.1 | 16.46 | 77.76 | 78.27 | 59.57 |
3030
+ | bge-base-zh-v1.5 | 66.49 | 63.25 | 57.02 | 29.74 | 80.47 | 84.88 | 63.64 |
3031
+ | bge-large-zh-v1.5 | 65.74 | 63.39 | 57.03 | 28.74 | 83.45 | 85.44 | 63.97 |
3032
+ | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | 67.28 | 63.95 | 60.45 | 35.46 | 81.26 | 84.1 | 65.42 |
3033
+ | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | 67.6 | 64.03 | 61.44 | 37.16 | 82.15 | 84.18 | 66.09 |
3034
+
3035
+ \* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval tasks
3036
+
3037
+ ## Train
3038
+
3039
+ ### BAAI Embedding
3040
+
3041
+ We pre-train the models using [retromae](https://github.com/staoxiao/RetroMAE) and train them on large-scale pairs data using contrastive learning.
3042
+ **You can fine-tune the embedding model on your data following our [examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune).**
3043
+ We also provide a [pre-train example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain).
3044
+ Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned.
3045
+ More training details for bge see [baai_general_embedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md).
3046
+
3047
+
3048
+
3049
+ ### BGE Reranker
3050
+
3051
+ Cross-encoder will perform full-attention over the input pair,
3052
+ which is more accurate than embedding model (i.e., bi-encoder) but more time-consuming than embedding model.
3053
+ Therefore, it can be used to re-rank the top-k documents returned by embedding model.
3054
+ We train the cross-encoder on a multilingual pair data,
3055
+ The data format is the same as embedding model, so you can fine-tune it easily following our [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker).
3056
+ More details please refer to [./FlagEmbedding/reranker/README.md](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)
3057
+
3058
+
3059
+ ## Contact
3060
+ If you have any question or suggestion related to this project, feel free to open an issue or pull request.
3061
+ You also can email Shitao Xiao([email protected]) and Zheng Liu([email protected]).
3062
+
3063
+
3064
+ ## Citation
3065
+
3066
+ If you find this repository useful, please consider giving a star :star: and citation
3067
+
3068
+ ```
3069
+ @misc{bge_embedding,
3070
+ title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
3071
+ author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
3072
+ year={2023},
3073
+ eprint={2309.07597},
3074
+ archivePrefix={arXiv},
3075
+ primaryClass={cs.CL}
3076
+ }
3077
+ ```
3078
+
3079
+ ## License
3080
+ FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge.
3081
+
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/root/.cache/torch/sentence_transformers/BAAI_bge-large-en/",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 16,
24
+ "num_hidden_layers": 24,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.30.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522,
32
+ "bos_token": "<s>",
33
+ "eos_token": "</s>",
34
+ "layer_norm_epsilon": 1e-12,
35
+ "unk_token": "[UNK]"
36
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.28.1",
5
+ "pytorch": "1.13.0+cu117"
6
+ }
7
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cebed1eb4a771764deca2a82e0722280c2de36b29343038fa651a21d9d98d80
3
+ size 670300108
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff