diff --git "a/README.md" "b/README.md" new file mode 100644--- /dev/null +++ "b/README.md" @@ -0,0 +1,2835 @@ +--- +tags: +- sentence-transformers +- feature-extraction +- sentence-similarity +- transformers +- mteb +license: apache-2.0 +model-index: +- name: bge-en-mistral + results: + - dataset: + config: en + name: MTEB AmazonCounterfactualClassification (en) + revision: e8379541af4e31359cca9fbcf4b00f2671dba205 + split: test + type: mteb/amazon_counterfactual + metrics: + - type: accuracy + value: 93.1492537313433 + - type: ap + value: 72.56132559564212 + - type: f1 + value: 89.71796898040243 + - type: main_score + value: 93.1492537313433 + task: + type: Classification + - dataset: + config: en + name: MTEB AmazonCounterfactualClassification (en) + revision: e8379541af4e31359cca9fbcf4b00f2671dba205 + split: validation + type: mteb/amazon_counterfactual + metrics: + - type: accuracy + value: 93.04477611940298 + - type: ap + value: 68.51763006673485 + - type: f1 + value: 88.44832081571468 + - type: main_score + value: 93.04477611940298 + task: + type: Classification + - dataset: + config: default + name: MTEB AmazonPolarityClassification (default) + revision: e2d317d38cd51312af73b3d32a06d1a08b442046 + split: test + type: mteb/amazon_polarity + metrics: + - type: accuracy + value: 96.98372499999999 + - type: ap + value: 95.62303091773919 + - type: f1 + value: 96.98308191715637 + - type: main_score + value: 96.98372499999999 + task: + type: Classification + - dataset: + config: en + name: MTEB AmazonReviewsClassification (en) + revision: 1399c76144fd37290681b995c656ef9b2e06e26d + split: test + type: mteb/amazon_reviews_multi + metrics: + - type: accuracy + value: 61.461999999999996 + - type: f1 + value: 60.57257766583118 + - type: main_score + value: 61.461999999999996 + task: + type: Classification + - dataset: + config: en + name: MTEB AmazonReviewsClassification (en) + revision: 1399c76144fd37290681b995c656ef9b2e06e26d + split: validation + type: mteb/amazon_reviews_multi + metrics: + - type: accuracy + value: 61.204 + - type: f1 + value: 60.262736729265384 + - type: main_score + value: 61.204 + task: + type: Classification + - dataset: + config: default + name: MTEB ArxivClusteringP2P (default) + revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d + split: test + type: mteb/arxiv-clustering-p2p + metrics: + - type: main_score + value: 54.43859683357485 + - type: v_measure + value: 54.43859683357485 + - type: v_measure_std + value: 14.511128158596337 + task: + type: Clustering + - dataset: + config: default + name: MTEB ArxivClusteringS2S (default) + revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 + split: test + type: mteb/arxiv-clustering-s2s + metrics: + - type: main_score + value: 49.33365996236564 + - type: v_measure + value: 49.33365996236564 + - type: v_measure_std + value: 14.61261944856548 + task: + type: Clustering + - dataset: + config: default + name: MTEB AskUbuntuDupQuestions (default) + revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 + split: test + type: mteb/askubuntudupquestions-reranking + metrics: + - type: main_score + value: 65.15263966490278 + - type: map + value: 65.15263966490278 + - type: mrr + value: 77.90331090885107 + task: + type: Reranking + - dataset: + config: default + name: MTEB BIOSSES (default) + revision: d3fb88f8f02e40887cd149695127462bbcf29b4a + split: test + type: mteb/biosses-sts + metrics: + - type: main_score + value: 86.47365710792691 + task: + type: STS + - dataset: + config: default + name: MTEB Banking77Classification (default) + revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 + split: test + type: mteb/banking77 + metrics: + - type: accuracy + value: 91.48701298701299 + - type: f1 + value: 91.4733869423637 + - type: main_score + value: 91.48701298701299 + task: + type: Classification + - dataset: + config: default + name: MTEB BiorxivClusteringP2P (default) + revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 + split: test + type: mteb/biorxiv-clustering-p2p + metrics: + - type: main_score + value: 53.050461108038036 + - type: v_measure + value: 53.050461108038036 + - type: v_measure_std + value: 0.9436104839012786 + task: + type: Clustering + - dataset: + config: default + name: MTEB BiorxivClusteringS2S (default) + revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 + split: test + type: mteb/biorxiv-clustering-s2s + metrics: + - type: main_score + value: 48.38215568371151 + - type: v_measure + value: 48.38215568371151 + - type: v_measure_std + value: 0.9104384504649026 + task: + type: Clustering + - dataset: + config: default + name: MTEB EmotionClassification (default) + revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 + split: test + type: mteb/emotion + metrics: + - type: accuracy + value: 93.36 + - type: f1 + value: 89.73665936982262 + - type: main_score + value: 93.36 + task: + type: Classification + - dataset: + config: default + name: MTEB EmotionClassification (default) + revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 + split: validation + type: mteb/emotion + metrics: + - type: accuracy + value: 94.14 + - type: f1 + value: 91.63163961443355 + - type: main_score + value: 94.14 + task: + type: Classification + - dataset: + config: default + name: MTEB ImdbClassification (default) + revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 + split: test + type: mteb/imdb + metrics: + - type: accuracy + value: 96.9144 + - type: ap + value: 95.45276911068486 + - type: f1 + value: 96.91412729455966 + - type: main_score + value: 96.9144 + task: + type: Classification + - dataset: + config: en + name: MTEB MTOPDomainClassification (en) + revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf + split: test + type: mteb/mtop_domain + metrics: + - type: accuracy + value: 98.42225262197901 + - type: f1 + value: 98.31652547061115 + - type: main_score + value: 98.42225262197901 + task: + type: Classification + - dataset: + config: en + name: MTEB MTOPDomainClassification (en) + revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf + split: validation + type: mteb/mtop_domain + metrics: + - type: accuracy + value: 98.60850111856824 + - type: f1 + value: 98.49625189176408 + - type: main_score + value: 98.60850111856824 + task: + type: Classification + - dataset: + config: en + name: MTEB MTOPIntentClassification (en) + revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba + split: test + type: mteb/mtop_intent + metrics: + - type: accuracy + value: 94.00136798905609 + - type: f1 + value: 82.7022316533099 + - type: main_score + value: 94.00136798905609 + task: + type: Classification + - dataset: + config: en + name: MTEB MTOPIntentClassification (en) + revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba + split: validation + type: mteb/mtop_intent + metrics: + - type: accuracy + value: 93.89261744966441 + - type: f1 + value: 78.76796618262529 + - type: main_score + value: 93.89261744966441 + task: + type: Classification + - dataset: + config: en + name: MTEB MassiveIntentClassification (en) + revision: 4672e20407010da34463acc759c162ca9734bca6 + split: test + type: mteb/amazon_massive_intent + metrics: + - type: accuracy + value: 82.92535305985204 + - type: f1 + value: 79.885538231847 + - type: main_score + value: 82.92535305985204 + task: + type: Classification + - dataset: + config: en + name: MTEB MassiveIntentClassification (en) + revision: 4672e20407010da34463acc759c162ca9734bca6 + split: validation + type: mteb/amazon_massive_intent + metrics: + - type: accuracy + value: 83.55140186915888 + - type: f1 + value: 81.09072707555056 + - type: main_score + value: 83.55140186915888 + task: + type: Classification + - dataset: + config: en + name: MTEB MassiveScenarioClassification (en) + revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 + split: test + type: mteb/amazon_massive_scenario + metrics: + - type: accuracy + value: 85.60188298587758 + - type: f1 + value: 84.87416963499224 + - type: main_score + value: 85.60188298587758 + task: + type: Classification + - dataset: + config: en + name: MTEB MassiveScenarioClassification (en) + revision: fad2c6e8459f9e1c45d9315f4953d921437d70f8 + split: validation + type: mteb/amazon_massive_scenario + metrics: + - type: accuracy + value: 85.01721593703886 + - type: f1 + value: 84.05277245992066 + - type: main_score + value: 85.01721593703886 + task: + type: Classification + - dataset: + config: default + name: MTEB MedrxivClusteringP2P (default) + revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 + split: test + type: mteb/medrxiv-clustering-p2p + metrics: + - type: main_score + value: 45.86171497327639 + - type: v_measure + value: 45.86171497327639 + - type: v_measure_std + value: 1.551347259003324 + task: + type: Clustering + - dataset: + config: default + name: MTEB MedrxivClusteringS2S (default) + revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 + split: test + type: mteb/medrxiv-clustering-s2s + metrics: + - type: main_score + value: 44.33336692345644 + - type: v_measure + value: 44.33336692345644 + - type: v_measure_std + value: 1.5931408596404715 + task: + type: Clustering + - dataset: + config: default + name: MTEB MindSmallReranking (default) + revision: 59042f120c80e8afa9cdbb224f67076cec0fc9a7 + split: test + type: mteb/mind_small + metrics: + - type: main_score + value: 30.597409734750503 + - type: map + value: 30.597409734750503 + - type: mrr + value: 31.397041548018457 + task: + type: Reranking + - dataset: + config: default + name: MTEB RedditClustering (default) + revision: 24640382cdbf8abc73003fb0fa6d111a705499eb + split: test + type: mteb/reddit-clustering + metrics: + - type: main_score + value: 72.33008348681277 + - type: v_measure + value: 72.33008348681277 + - type: v_measure_std + value: 2.9203215463933008 + task: + type: Clustering + - dataset: + config: default + name: MTEB RedditClusteringP2P (default) + revision: 385e3cb46b4cfa89021f56c4380204149d0efe33 + split: test + type: mteb/reddit-clustering-p2p + metrics: + - type: main_score + value: 72.72079657828903 + - type: v_measure + value: 72.72079657828903 + - type: v_measure_std + value: 11.930271663428735 + task: + type: Clustering + - dataset: + config: default + name: MTEB SICK-R (default) + revision: 20a6d6f312dd54037fe07a32d58e5e168867909d + split: test + type: mteb/sickr-sts + metrics: + - type: main_score + value: 83.86733787791422 + task: + type: STS + - dataset: + config: default + name: MTEB STS12 (default) + revision: a0d554a64d88156834ff5ae9920b964011b16384 + split: test + type: mteb/sts12-sts + metrics: + - type: main_score + value: 78.14269330480724 + task: + type: STS + - dataset: + config: default + name: MTEB STS13 (default) + revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca + split: test + type: mteb/sts13-sts + metrics: + - type: main_score + value: 86.58640009300751 + task: + type: STS + - dataset: + config: default + name: MTEB STS14 (default) + revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 + split: test + type: mteb/sts14-sts + metrics: + - type: main_score + value: 82.8292579957437 + task: + type: STS + - dataset: + config: default + name: MTEB STS15 (default) + revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 + split: test + type: mteb/sts15-sts + metrics: + - type: main_score + value: 87.77203714228862 + task: + type: STS + - dataset: + config: default + name: MTEB STS16 (default) + revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 + split: test + type: mteb/sts16-sts + metrics: + - type: main_score + value: 87.0439304006969 + task: + type: STS + - dataset: + config: en-en + name: MTEB STS17 (en-en) + revision: faeb762787bd10488a50c8b5be4a3b82e411949c + split: test + type: mteb/sts17-crosslingual-sts + metrics: + - type: main_score + value: 91.24736138013424 + task: + type: STS + - dataset: + config: en + name: MTEB STS22 (en) + revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 + split: test + type: mteb/sts22-crosslingual-sts + metrics: + - type: main_score + value: 70.07326214706 + task: + type: STS + - dataset: + config: default + name: MTEB STSBenchmark (default) + revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 + split: test + type: mteb/stsbenchmark-sts + metrics: + - type: main_score + value: 88.42076443255168 + task: + type: STS + - dataset: + config: default + name: MTEB SciDocsRR (default) + revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab + split: test + type: mteb/scidocs-reranking + metrics: + - type: main_score + value: 86.9584489124583 + - type: map + value: 86.9584489124583 + - type: mrr + value: 96.59475328592976 + task: + type: Reranking + - dataset: + config: default + name: MTEB SprintDuplicateQuestions (default) + revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 + split: test + type: mteb/sprintduplicatequestions-pairclassification + metrics: + - type: main_score + value: 97.26819027722253 + - type: cos_sim_accuracy + value: 99.88019801980198 + - type: cos_sim_accuracy_threshold + value: 76.67685151100159 + - type: cos_sim_ap + value: 97.23260568085786 + - type: cos_sim_f1 + value: 93.91824526420737 + - type: cos_sim_f1_threshold + value: 75.82710981369019 + - type: cos_sim_precision + value: 93.63817097415506 + - type: cos_sim_recall + value: 94.19999999999999 + - type: dot_accuracy + value: 99.88019801980198 + - type: dot_accuracy_threshold + value: 76.67686343193054 + - type: dot_ap + value: 97.23260568085786 + - type: dot_f1 + value: 93.91824526420737 + - type: dot_f1_threshold + value: 75.8271336555481 + - type: dot_precision + value: 93.63817097415506 + - type: dot_recall + value: 94.19999999999999 + - type: euclidean_accuracy + value: 99.88019801980198 + - type: euclidean_accuracy_threshold + value: 68.29807758331299 + - type: euclidean_ap + value: 97.23259982599497 + - type: euclidean_f1 + value: 93.91824526420737 + - type: euclidean_f1_threshold + value: 69.53110694885254 + - type: euclidean_precision + value: 93.63817097415506 + - type: euclidean_recall + value: 94.19999999999999 + - type: manhattan_accuracy + value: 99.87821782178217 + - type: manhattan_accuracy_threshold + value: 3482.6908111572266 + - type: manhattan_ap + value: 97.26819027722253 + - type: manhattan_f1 + value: 93.92592592592592 + - type: manhattan_f1_threshold + value: 3555.5641174316406 + - type: manhattan_precision + value: 92.78048780487805 + - type: manhattan_recall + value: 95.1 + - type: max_accuracy + value: 99.88019801980198 + - type: max_ap + value: 97.26819027722253 + - type: max_f1 + value: 93.92592592592592 + task: + type: PairClassification + - dataset: + config: default + name: MTEB SprintDuplicateQuestions (default) + revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 + split: validation + type: mteb/sprintduplicatequestions-pairclassification + metrics: + - type: main_score + value: 98.02470052972619 + - type: cos_sim_accuracy + value: 99.88811881188118 + - type: cos_sim_accuracy_threshold + value: 75.25776028633118 + - type: cos_sim_ap + value: 97.97198133050095 + - type: cos_sim_f1 + value: 94.37531110004977 + - type: cos_sim_f1_threshold + value: 75.25776028633118 + - type: cos_sim_precision + value: 93.95441030723488 + - type: cos_sim_recall + value: 94.8 + - type: dot_accuracy + value: 99.88811881188118 + - type: dot_accuracy_threshold + value: 75.25776624679565 + - type: dot_ap + value: 97.97198133050095 + - type: dot_f1 + value: 94.37531110004977 + - type: dot_f1_threshold + value: 75.25776624679565 + - type: dot_precision + value: 93.95441030723488 + - type: dot_recall + value: 94.8 + - type: euclidean_accuracy + value: 99.88811881188118 + - type: euclidean_accuracy_threshold + value: 70.34507989883423 + - type: euclidean_ap + value: 97.97198133050095 + - type: euclidean_f1 + value: 94.37531110004977 + - type: euclidean_f1_threshold + value: 70.34507989883423 + - type: euclidean_precision + value: 93.95441030723488 + - type: euclidean_recall + value: 94.8 + - type: manhattan_accuracy + value: 99.89207920792079 + - type: manhattan_accuracy_threshold + value: 3481.599807739258 + - type: manhattan_ap + value: 98.02470052972619 + - type: manhattan_f1 + value: 94.52536413862381 + - type: manhattan_f1_threshold + value: 3481.599807739258 + - type: manhattan_precision + value: 94.95459132189707 + - type: manhattan_recall + value: 94.1 + - type: max_accuracy + value: 99.89207920792079 + - type: max_ap + value: 98.02470052972619 + - type: max_f1 + value: 94.52536413862381 + task: + type: PairClassification + - dataset: + config: default + name: MTEB StackExchangeClustering (default) + revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 + split: test + type: mteb/stackexchange-clustering + metrics: + - type: main_score + value: 81.32419328350603 + - type: v_measure + value: 81.32419328350603 + - type: v_measure_std + value: 2.666861121694755 + task: + type: Clustering + - dataset: + config: default + name: MTEB StackExchangeClusteringP2P (default) + revision: 815ca46b2622cec33ccafc3735d572c266efdb44 + split: test + type: mteb/stackexchange-clustering-p2p + metrics: + - type: main_score + value: 46.048387963107565 + - type: v_measure + value: 46.048387963107565 + - type: v_measure_std + value: 1.4102848576321703 + task: + type: Clustering + - dataset: + config: default + name: MTEB StackOverflowDupQuestions (default) + revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 + split: test + type: mteb/stackoverflowdupquestions-reranking + metrics: + - type: main_score + value: 56.70574900554072 + - type: map + value: 56.70574900554072 + - type: mrr + value: 57.517109116373824 + task: + type: Reranking + - dataset: + config: default + name: MTEB SummEval (default) + revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c + split: test + type: mteb/summeval + metrics: + - type: main_score + value: 30.76932903185174 + task: + type: Summarization + - dataset: + config: default + name: MTEB ToxicConversationsClassification (default) + revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de + split: test + type: mteb/toxic_conversations_50k + metrics: + - type: accuracy + value: 93.173828125 + - type: ap + value: 46.040184641424396 + - type: f1 + value: 80.77280549412752 + - type: main_score + value: 93.173828125 + task: + type: Classification + - dataset: + config: default + name: MTEB TweetSentimentExtractionClassification (default) + revision: d604517c81ca91fe16a244d1248fc021f9ecee7a + split: test + type: mteb/tweet_sentiment_extraction + metrics: + - type: accuracy + value: 79.9320882852292 + - type: f1 + value: 80.22638685975485 + - type: main_score + value: 79.9320882852292 + task: + type: Classification + - dataset: + config: default + name: MTEB TwentyNewsgroupsClustering (default) + revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 + split: test + type: mteb/twentynewsgroups-clustering + metrics: + - type: main_score + value: 68.98152919711418 + - type: v_measure + value: 68.98152919711418 + - type: v_measure_std + value: 1.2519720970652428 + task: + type: Clustering + - dataset: + config: default + name: MTEB TwitterSemEval2015 (default) + revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 + split: test + type: mteb/twittersemeval2015-pairclassification + metrics: + - type: main_score + value: 79.34189681158234 + - type: cos_sim_accuracy + value: 87.68552184538356 + - type: cos_sim_accuracy_threshold + value: 76.06316804885864 + - type: cos_sim_ap + value: 79.34189149773933 + - type: cos_sim_f1 + value: 72.16386554621849 + - type: cos_sim_f1_threshold + value: 73.62890243530273 + - type: cos_sim_precision + value: 71.82435964453737 + - type: cos_sim_recall + value: 72.5065963060686 + - type: dot_accuracy + value: 87.68552184538356 + - type: dot_accuracy_threshold + value: 76.06316208839417 + - type: dot_ap + value: 79.34189231911259 + - type: dot_f1 + value: 72.16386554621849 + - type: dot_f1_threshold + value: 73.62889647483826 + - type: dot_precision + value: 71.82435964453737 + - type: dot_recall + value: 72.5065963060686 + - type: euclidean_accuracy + value: 87.68552184538356 + - type: euclidean_accuracy_threshold + value: 69.19080018997192 + - type: euclidean_ap + value: 79.34189681158234 + - type: euclidean_f1 + value: 72.16386554621849 + - type: euclidean_f1_threshold + value: 72.62383103370667 + - type: euclidean_precision + value: 71.82435964453737 + - type: euclidean_recall + value: 72.5065963060686 + - type: manhattan_accuracy + value: 87.661679680515 + - type: manhattan_accuracy_threshold + value: 3408.807373046875 + - type: manhattan_ap + value: 79.29617544165136 + - type: manhattan_f1 + value: 72.1957671957672 + - type: manhattan_f1_threshold + value: 3597.7684020996094 + - type: manhattan_precision + value: 72.38726790450929 + - type: manhattan_recall + value: 72.00527704485488 + - type: max_accuracy + value: 87.68552184538356 + - type: max_ap + value: 79.34189681158234 + - type: max_f1 + value: 72.1957671957672 + task: + type: PairClassification + - dataset: + config: default + name: MTEB TwitterURLCorpus (default) + revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf + split: test + type: mteb/twitterurlcorpus-pairclassification + metrics: + - type: main_score + value: 87.8635519535718 + - type: cos_sim_accuracy + value: 89.80672953778088 + - type: cos_sim_accuracy_threshold + value: 73.09532165527344 + - type: cos_sim_ap + value: 87.84251379545145 + - type: cos_sim_f1 + value: 80.25858884373845 + - type: cos_sim_f1_threshold + value: 70.57080268859863 + - type: cos_sim_precision + value: 77.14103110353643 + - type: cos_sim_recall + value: 83.63874345549738 + - type: dot_accuracy + value: 89.80672953778088 + - type: dot_accuracy_threshold + value: 73.09532761573792 + - type: dot_ap + value: 87.84251881260793 + - type: dot_f1 + value: 80.25858884373845 + - type: dot_f1_threshold + value: 70.57079076766968 + - type: dot_precision + value: 77.14103110353643 + - type: dot_recall + value: 83.63874345549738 + - type: euclidean_accuracy + value: 89.80672953778088 + - type: euclidean_accuracy_threshold + value: 73.3548641204834 + - type: euclidean_ap + value: 87.84251335039049 + - type: euclidean_f1 + value: 80.25858884373845 + - type: euclidean_f1_threshold + value: 76.71923041343689 + - type: euclidean_precision + value: 77.14103110353643 + - type: euclidean_recall + value: 83.63874345549738 + - type: manhattan_accuracy + value: 89.78150347343501 + - type: manhattan_accuracy_threshold + value: 3702.7603149414062 + - type: manhattan_ap + value: 87.8635519535718 + - type: manhattan_f1 + value: 80.27105660516332 + - type: manhattan_f1_threshold + value: 3843.5962677001953 + - type: manhattan_precision + value: 76.9361101306036 + - type: manhattan_recall + value: 83.90822297505389 + - type: max_accuracy + value: 89.80672953778088 + - type: max_ap + value: 87.8635519535718 + - type: max_f1 + value: 80.27105660516332 + task: + type: PairClassification + - task: + type: Retrieval + dataset: + type: nfcorpus + name: MTEB NFCorpus + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 52.47678018575851 + - type: ndcg_at_3 + value: 47.43993801247414 + - type: ndcg_at_5 + value: 45.08173173082719 + - type: ndcg_at_10 + value: 41.850870119787835 + - type: ndcg_at_100 + value: 37.79284946590978 + - type: ndcg_at_1000 + value: 46.58046062123418 + - type: map_at_1 + value: 6.892464464226138 + - type: map_at_3 + value: 12.113195798233127 + - type: map_at_5 + value: 13.968475602788812 + - type: map_at_10 + value: 16.47564069781326 + - type: map_at_100 + value: 20.671726065190025 + - type: map_at_1000 + value: 22.328875914012006 + - type: precision_at_1 + value: 53.86996904024768 + - type: precision_at_3 + value: 43.96284829721363 + - type: precision_at_5 + value: 38.69969040247682 + - type: precision_at_10 + value: 30.928792569659457 + - type: precision_at_100 + value: 9.507739938080498 + - type: precision_at_1000 + value: 2.25882352941176 + - type: recall_at_1 + value: 6.892464464226138 + - type: recall_at_3 + value: 13.708153358278407 + - type: recall_at_5 + value: 16.651919797359145 + - type: recall_at_10 + value: 21.01801714352559 + - type: recall_at_100 + value: 37.01672102843443 + - type: recall_at_1000 + value: 69.8307270724072 + - task: + type: Retrieval + dataset: + type: msmarco + name: MTEB MSMARCO + config: default + split: dev + revision: None + metrics: + - type: ndcg_at_1 + value: 26.63323782234957 + - type: ndcg_at_3 + value: 38.497585804985754 + - type: ndcg_at_5 + value: 42.72761631631636 + - type: ndcg_at_10 + value: 46.78865753107054 + - type: ndcg_at_100 + value: 51.96170786623209 + - type: ndcg_at_1000 + value: 52.82713901970963 + - type: map_at_1 + value: 25.89063992359121 + - type: map_at_3 + value: 35.299466730340654 + - type: map_at_5 + value: 37.68771887933786 + - type: map_at_10 + value: 39.40908074468253 + - type: map_at_100 + value: 40.53444082323405 + - type: map_at_1000 + value: 40.57183037649452 + - type: precision_at_1 + value: 26.63323782234957 + - type: precision_at_3 + value: 16.265520534861793 + - type: precision_at_5 + value: 11.902578796562304 + - type: precision_at_10 + value: 7.262177650430416 + - type: precision_at_100 + value: 0.9819484240687512 + - type: precision_at_1000 + value: 0.10571633237823287 + - type: recall_at_1 + value: 25.89063992359121 + - type: recall_at_3 + value: 46.99737344794652 + - type: recall_at_5 + value: 57.160936007640906 + - type: recall_at_10 + value: 69.43409742120343 + - type: recall_at_100 + value: 92.86413562559697 + - type: recall_at_1000 + value: 99.3230659025788 + - task: + type: Retrieval + dataset: + type: fiqa + name: MTEB FiQA2018 + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 57.407407407407405 + - type: ndcg_at_3 + value: 53.79975378289304 + - type: ndcg_at_5 + value: 56.453379423655406 + - type: ndcg_at_10 + value: 59.67151242793314 + - type: ndcg_at_100 + value: 65.34055762539253 + - type: ndcg_at_1000 + value: 67.07707746043032 + - type: map_at_1 + value: 30.65887045053714 + - type: map_at_3 + value: 44.09107110881799 + - type: map_at_5 + value: 48.18573748068346 + - type: map_at_10 + value: 51.03680979612876 + - type: map_at_100 + value: 53.03165194566928 + - type: map_at_1000 + value: 53.16191096190861 + - type: precision_at_1 + value: 57.407407407407405 + - type: precision_at_3 + value: 35.493827160493886 + - type: precision_at_5 + value: 26.913580246913547 + - type: precision_at_10 + value: 16.435185185185155 + - type: precision_at_100 + value: 2.2685185185184986 + - type: precision_at_1000 + value: 0.25864197530863964 + - type: recall_at_1 + value: 30.65887045053714 + - type: recall_at_3 + value: 48.936723427464194 + - type: recall_at_5 + value: 58.55942925387371 + - type: recall_at_10 + value: 68.45128551147073 + - type: recall_at_100 + value: 88.24599311867836 + - type: recall_at_1000 + value: 98.18121693121691 + - task: + type: Retrieval + dataset: + type: scidocs + name: MTEB SCIDOCS + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 28.7 + - type: ndcg_at_3 + value: 23.61736427940938 + - type: ndcg_at_5 + value: 20.845690325673885 + - type: ndcg_at_10 + value: 25.25865384510787 + - type: ndcg_at_100 + value: 36.18596641088721 + - type: ndcg_at_1000 + value: 41.7166868935345 + - type: map_at_1 + value: 5.828333333333361 + - type: map_at_3 + value: 10.689166666666676 + - type: map_at_5 + value: 13.069916666666668 + - type: map_at_10 + value: 15.4901164021164 + - type: map_at_100 + value: 18.61493245565425 + - type: map_at_1000 + value: 18.99943478016456 + - type: precision_at_1 + value: 28.7 + - type: precision_at_3 + value: 22.30000000000006 + - type: precision_at_5 + value: 18.55999999999997 + - type: precision_at_10 + value: 13.289999999999946 + - type: precision_at_100 + value: 2.905000000000005 + - type: precision_at_1000 + value: 0.4218999999999946 + - type: recall_at_1 + value: 5.828333333333361 + - type: recall_at_3 + value: 13.548333333333387 + - type: recall_at_5 + value: 18.778333333333308 + - type: recall_at_10 + value: 26.939999999999902 + - type: recall_at_100 + value: 58.91333333333344 + - type: recall_at_1000 + value: 85.57499999999972 + - task: + type: Retrieval + dataset: + type: fever + name: MTEB FEVER + config: defaultcqa + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 88.98889888988899 + - type: ndcg_at_3 + value: 91.82404417747676 + - type: ndcg_at_5 + value: 92.41785792357787 + - type: ndcg_at_10 + value: 92.82809814626805 + - type: ndcg_at_100 + value: 93.31730867509245 + - type: ndcg_at_1000 + value: 93.45171203408582 + - type: map_at_1 + value: 82.64125817343636 + - type: map_at_3 + value: 89.39970782792554 + - type: map_at_5 + value: 89.96799501378695 + - type: map_at_10 + value: 90.27479706587437 + - type: map_at_100 + value: 90.45185655778057 + - type: map_at_1000 + value: 90.46130471574544 + - type: precision_at_1 + value: 88.98889888988899 + - type: precision_at_3 + value: 34.923492349234245 + - type: precision_at_5 + value: 21.524152415244043 + - type: precision_at_10 + value: 11.033603360337315 + - type: precision_at_100 + value: 1.1521152115211895 + - type: precision_at_1000 + value: 0.11765676567657675 + - type: recall_at_1 + value: 82.64125817343636 + - type: recall_at_3 + value: 94.35195900542428 + - type: recall_at_5 + value: 95.9071323799047 + - type: recall_at_10 + value: 97.04234113887586 + - type: recall_at_100 + value: 98.77282371094255 + - type: recall_at_1000 + value: 99.5555567461508 + - task: + type: Retrieval + dataset: + type: arguana + name: MTEB ArguAna + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 66.50071123755335 + - type: ndcg_at_3 + value: 80.10869593172173 + - type: ndcg_at_5 + value: 81.89670542467924 + - type: ndcg_at_10 + value: 83.07967801208441 + - type: ndcg_at_100 + value: 83.5991349601075 + - type: ndcg_at_1000 + value: 83.5991349601075 + - type: map_at_1 + value: 66.50071123755335 + - type: map_at_3 + value: 76.83736367946898 + - type: map_at_5 + value: 77.8473210052158 + - type: map_at_10 + value: 78.35472690735851 + - type: map_at_100 + value: 78.47388207611678 + - type: map_at_1000 + value: 78.47388207611678 + - type: precision_at_1 + value: 66.50071123755335 + - type: precision_at_3 + value: 29.848269321953076 + - type: precision_at_5 + value: 18.762446657183045 + - type: precision_at_10 + value: 9.736842105262909 + - type: precision_at_100 + value: 0.9964438122332677 + - type: precision_at_1000 + value: 0.09964438122332549 + - type: recall_at_1 + value: 66.50071123755335 + - type: recall_at_3 + value: 89.5448079658606 + - type: recall_at_5 + value: 93.8122332859175 + - type: recall_at_10 + value: 97.36842105263158 + - type: recall_at_100 + value: 99.6443812233286 + - type: recall_at_1000 + value: 99.6443812233286 + - task: + type: Retrieval + dataset: + type: scifact + name: MTEB SciFact + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 66.0 + - type: ndcg_at_3 + value: 74.98853481223065 + - type: ndcg_at_5 + value: 77.29382051205019 + - type: ndcg_at_10 + value: 79.09159079425369 + - type: ndcg_at_100 + value: 80.29692802526776 + - type: ndcg_at_1000 + value: 80.55210036585547 + - type: map_at_1 + value: 62.994444444444454 + - type: map_at_3 + value: 71.7425925925926 + - type: map_at_5 + value: 73.6200925925926 + - type: map_at_10 + value: 74.50223544973547 + - type: map_at_100 + value: 74.82438594015447 + - type: map_at_1000 + value: 74.83420474892468 + - type: precision_at_1 + value: 66.0 + - type: precision_at_3 + value: 29.44444444444439 + - type: precision_at_5 + value: 19.40000000000008 + - type: precision_at_10 + value: 10.366666666666715 + - type: precision_at_100 + value: 1.0999999999999928 + - type: precision_at_1000 + value: 0.11200000000000007 + - type: recall_at_1 + value: 62.994444444444454 + - type: recall_at_3 + value: 80.89999999999998 + - type: recall_at_5 + value: 86.72777777777779 + - type: recall_at_10 + value: 91.88888888888887 + - type: recall_at_100 + value: 97.0 + - type: recall_at_1000 + value: 99.0 + - task: + type: Retrieval + dataset: + type: trec-covid + name: MTEB TRECCOVID + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 83.0 + - type: ndcg_at_3 + value: 79.86598407528447 + - type: ndcg_at_5 + value: 79.27684428714952 + - type: ndcg_at_10 + value: 79.07987651251462 + - type: ndcg_at_100 + value: 64.55029164391163 + - type: ndcg_at_1000 + value: 59.42333857860492 + - type: map_at_1 + value: 0.226053732680979 + - type: map_at_3 + value: 0.644034626013194 + - type: map_at_5 + value: 1.045196967937728 + - type: map_at_10 + value: 2.0197496659905085 + - type: map_at_100 + value: 13.316018005224159 + - type: map_at_1000 + value: 33.784766957424104 + - type: precision_at_1 + value: 88.0 + - type: precision_at_3 + value: 86.66666666666667 + - type: precision_at_5 + value: 85.20000000000002 + - type: precision_at_10 + value: 84.19999999999997 + - type: precision_at_100 + value: 67.88000000000001 + - type: precision_at_1000 + value: 26.573999999999998 + - type: recall_at_1 + value: 0.226053732680979 + - type: recall_at_3 + value: 0.6754273711472734 + - type: recall_at_5 + value: 1.1168649828059245 + - type: recall_at_10 + value: 2.2215081031265207 + - type: recall_at_100 + value: 16.694165236664727 + - type: recall_at_1000 + value: 56.7022214857503 + - task: + type: Retrieval + dataset: + type: climate-fever + name: MTEB ClimateFEVER + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 44.36482084690554 + - type: ndcg_at_3 + value: 38.13005747178844 + - type: ndcg_at_5 + value: 40.83474510717123 + - type: ndcg_at_10 + value: 45.4272998284769 + - type: ndcg_at_100 + value: 52.880220707479516 + - type: ndcg_at_1000 + value: 55.364753427333 + - type: map_at_1 + value: 19.200868621064064 + - type: map_at_3 + value: 28.33785740137525 + - type: map_at_5 + value: 31.67162504524064 + - type: map_at_10 + value: 34.417673164090075 + - type: map_at_100 + value: 36.744753097028976 + - type: map_at_1000 + value: 36.91262189016135 + - type: precision_at_1 + value: 44.36482084690554 + - type: precision_at_3 + value: 29.14223669923975 + - type: precision_at_5 + value: 22.410423452768388 + - type: precision_at_10 + value: 14.293159609120309 + - type: precision_at_100 + value: 2.248859934853431 + - type: precision_at_1000 + value: 0.2722475570032542 + - type: recall_at_1 + value: 19.200868621064064 + - type: recall_at_3 + value: 34.132464712269176 + - type: recall_at_5 + value: 42.35613463626491 + - type: recall_at_10 + value: 52.50814332247546 + - type: recall_at_100 + value: 77.16178067318128 + - type: recall_at_1000 + value: 90.59174809989138 + - task: + type: Retrieval + dataset: + type: hotpotqa + name: MTEB HotpotQA + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 89.9392302498312 + - type: ndcg_at_3 + value: 81.2061569376288 + - type: ndcg_at_5 + value: 83.53311592078133 + - type: ndcg_at_10 + value: 85.13780800141961 + - type: ndcg_at_100 + value: 87.02630661625386 + - type: ndcg_at_1000 + value: 87.47294723601075 + - type: map_at_1 + value: 44.9696151249156 + - type: map_at_3 + value: 76.46972766148966 + - type: map_at_5 + value: 78.47749268512187 + - type: map_at_10 + value: 79.49792611170005 + - type: map_at_100 + value: 80.09409086274644 + - type: map_at_1000 + value: 80.11950878917663 + - type: precision_at_1 + value: 89.9392302498312 + - type: precision_at_3 + value: 53.261309925724234 + - type: precision_at_5 + value: 33.79338284942924 + - type: precision_at_10 + value: 17.69750168805041 + - type: precision_at_100 + value: 1.9141120864280805 + - type: precision_at_1000 + value: 0.19721809588118133 + - type: recall_at_1 + value: 44.9696151249156 + - type: recall_at_3 + value: 79.8919648885888 + - type: recall_at_5 + value: 84.48345712356516 + - type: recall_at_10 + value: 88.48750844024308 + - type: recall_at_100 + value: 95.70560432140446 + - type: recall_at_1000 + value: 98.60904794058068 + - task: + type: Retrieval + dataset: + type: nq + name: MTEB NQ + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 57.0683661645423 + - type: ndcg_at_3 + value: 67.89935813080585 + - type: ndcg_at_5 + value: 71.47769719452941 + - type: ndcg_at_10 + value: 73.88350836507092 + - type: ndcg_at_100 + value: 75.76561068060907 + - type: ndcg_at_1000 + value: 75.92437662684215 + - type: map_at_1 + value: 51.00424874468904 + - type: map_at_3 + value: 63.87359984550011 + - type: map_at_5 + value: 66.23696407879494 + - type: map_at_10 + value: 67.42415446608673 + - type: map_at_100 + value: 67.92692839842621 + - type: map_at_1000 + value: 67.93437922640133 + - type: precision_at_1 + value: 57.0683661645423 + - type: precision_at_3 + value: 29.692931633836416 + - type: precision_at_5 + value: 20.046349942062854 + - type: precision_at_10 + value: 10.950173812283 + - type: precision_at_100 + value: 1.1995944380069687 + - type: precision_at_1000 + value: 0.12146581691772171 + - type: recall_at_1 + value: 51.00424874468904 + - type: recall_at_3 + value: 75.93665507918116 + - type: recall_at_5 + value: 83.95133256083433 + - type: recall_at_10 + value: 90.78794901506375 + - type: recall_at_100 + value: 98.61915797605253 + - type: recall_at_1000 + value: 99.7827346465817 + - task: + type: Retrieval + dataset: + type: quora + name: MTEB QuoraRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 84.61999999999999 + - type: ndcg_at_3 + value: 88.57366734033212 + - type: ndcg_at_5 + value: 89.89804048972175 + - type: ndcg_at_10 + value: 90.95410848372035 + - type: ndcg_at_100 + value: 91.83227134455773 + - type: ndcg_at_1000 + value: 91.88368412611601 + - type: map_at_1 + value: 73.4670089207039 + - type: map_at_3 + value: 84.87862925508942 + - type: map_at_5 + value: 86.68002324701408 + - type: map_at_10 + value: 87.7165466015312 + - type: map_at_100 + value: 88.28718809614146 + - type: map_at_1000 + value: 88.29877148480672 + - type: precision_at_1 + value: 84.61999999999999 + - type: precision_at_3 + value: 38.82333333333838 + - type: precision_at_5 + value: 25.423999999998642 + - type: precision_at_10 + value: 13.787999999998583 + - type: precision_at_100 + value: 1.5442999999999767 + - type: precision_at_1000 + value: 0.15672999999997972 + - type: recall_at_1 + value: 73.4670089207039 + - type: recall_at_3 + value: 89.98389854832143 + - type: recall_at_5 + value: 93.88541046010576 + - type: recall_at_10 + value: 96.99779417520634 + - type: recall_at_100 + value: 99.80318763957743 + - type: recall_at_1000 + value: 99.99638888888889 + - task: + type: Retrieval + dataset: + type: webis-touche2020 + name: MTEB Touche2020 + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 33.6734693877551 + - type: ndcg_at_3 + value: 34.36843900446739 + - type: ndcg_at_5 + value: 32.21323786731918 + - type: ndcg_at_10 + value: 30.47934263207554 + - type: ndcg_at_100 + value: 41.49598869753928 + - type: ndcg_at_1000 + value: 52.32963949183662 + - type: map_at_1 + value: 3.0159801678718168 + - type: map_at_3 + value: 7.13837927642557 + - type: map_at_5 + value: 9.274004610363466 + - type: map_at_10 + value: 12.957368366814324 + - type: map_at_100 + value: 19.3070585127604 + - type: map_at_1000 + value: 20.809777161133532 + - type: precision_at_1 + value: 34.69387755102041 + - type: precision_at_3 + value: 36.054421768707485 + - type: precision_at_5 + value: 32.24489795918368 + - type: precision_at_10 + value: 27.142857142857146 + - type: precision_at_100 + value: 8.326530612244898 + - type: precision_at_1000 + value: 1.5755102040816336 + - type: recall_at_1 + value: 3.0159801678718168 + - type: recall_at_3 + value: 8.321771388428257 + - type: recall_at_5 + value: 11.737532394366069 + - type: recall_at_10 + value: 19.49315139822179 + - type: recall_at_100 + value: 50.937064145519685 + - type: recall_at_1000 + value: 83.4358283484675 + - task: + type: Retrieval + dataset: + type: dbpedia-entity + name: MTEB DBPedia + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 64.375 + - type: ndcg_at_3 + value: 55.677549598242614 + - type: ndcg_at_5 + value: 53.44347199908503 + - type: ndcg_at_10 + value: 51.634197691802754 + - type: ndcg_at_100 + value: 56.202861267183415 + - type: ndcg_at_1000 + value: 63.146019108272576 + - type: map_at_1 + value: 9.789380503780919 + - type: map_at_3 + value: 16.146582195277016 + - type: map_at_5 + value: 19.469695222167193 + - type: map_at_10 + value: 24.163327344766145 + - type: map_at_100 + value: 35.47047690245571 + - type: map_at_1000 + value: 37.5147432331838 + - type: precision_at_1 + value: 76.25 + - type: precision_at_3 + value: 59.08333333333333 + - type: precision_at_5 + value: 52.24999999999997 + - type: precision_at_10 + value: 42.54999999999994 + - type: precision_at_100 + value: 13.460000000000008 + - type: precision_at_1000 + value: 2.4804999999999966 + - type: recall_at_1 + value: 9.789380503780919 + - type: recall_at_3 + value: 17.48487134027656 + - type: recall_at_5 + value: 22.312024269698806 + - type: recall_at_10 + value: 30.305380335237324 + - type: recall_at_100 + value: 62.172868946596424 + - type: recall_at_1000 + value: 85.32410301328747 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackPhysicsRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 42.15591915303176 + - type: ndcg_at_3 + value: 48.15261407846446 + - type: ndcg_at_5 + value: 50.58031819816491 + - type: ndcg_at_10 + value: 53.159393156983015 + - type: ndcg_at_100 + value: 58.64024684800366 + - type: ndcg_at_1000 + value: 60.017254762428166 + - type: map_at_1 + value: 34.78577058702179 + - type: map_at_3 + value: 43.52147299813321 + - type: map_at_5 + value: 45.47857625732981 + - type: map_at_10 + value: 46.94467579029768 + - type: map_at_100 + value: 48.364473257035456 + - type: map_at_1000 + value: 48.460199893487435 + - type: precision_at_1 + value: 42.15591915303176 + - type: precision_at_3 + value: 22.842476740455762 + - type: precision_at_5 + value: 16.073147256977784 + - type: precision_at_10 + value: 9.566891241578338 + - type: precision_at_100 + value: 1.441770933589971 + - type: precision_at_1000 + value: 0.17045235803656864 + - type: recall_at_1 + value: 34.78577058702179 + - type: recall_at_3 + value: 51.705004026948195 + - type: recall_at_5 + value: 57.99470738835514 + - type: recall_at_10 + value: 65.73761786225693 + - type: recall_at_100 + value: 88.03733579833336 + - type: recall_at_1000 + value: 96.505175424102 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackStatsRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 30.061349693251532 + - type: ndcg_at_3 + value: 36.63708916157646 + - type: ndcg_at_5 + value: 38.61671491681753 + - type: ndcg_at_10 + value: 41.350655796840066 + - type: ndcg_at_100 + value: 46.45326227358081 + - type: ndcg_at_1000 + value: 48.582285457159266 + - type: map_at_1 + value: 26.9244205862304 + - type: map_at_3 + value: 33.585406725744164 + - type: map_at_5 + value: 34.91193310921073 + - type: map_at_10 + value: 36.15920645617732 + - type: map_at_100 + value: 37.25917602757753 + - type: map_at_1000 + value: 37.35543998586382 + - type: precision_at_1 + value: 30.061349693251532 + - type: precision_at_3 + value: 16.002044989775 + - type: precision_at_5 + value: 11.012269938650379 + - type: precision_at_10 + value: 6.625766871165693 + - type: precision_at_100 + value: 1.0015337423312758 + - type: precision_at_1000 + value: 0.12638036809815958 + - type: recall_at_1 + value: 26.9244205862304 + - type: recall_at_3 + value: 40.92407975460122 + - type: recall_at_5 + value: 45.74576284315548 + - type: recall_at_10 + value: 54.04032657867014 + - type: recall_at_100 + value: 76.89573533447586 + - type: recall_at_1000 + value: 92.10000029943193 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackWebmastersRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 36.16600790513834 + - type: ndcg_at_3 + value: 41.39539336351464 + - type: ndcg_at_5 + value: 44.286188181817465 + - type: ndcg_at_10 + value: 46.8079293900759 + - type: ndcg_at_100 + value: 52.77618002686582 + - type: ndcg_at_1000 + value: 54.74554787022661 + - type: map_at_1 + value: 29.947644735902585 + - type: map_at_3 + value: 36.84394907359118 + - type: map_at_5 + value: 38.9461665221235 + - type: map_at_10 + value: 40.38325122041743 + - type: map_at_100 + value: 42.15067269020822 + - type: map_at_1000 + value: 42.396412886053454 + - type: precision_at_1 + value: 36.16600790513834 + - type: precision_at_3 + value: 19.23583662714091 + - type: precision_at_5 + value: 14.268774703557394 + - type: precision_at_10 + value: 9.071146245059353 + - type: precision_at_100 + value: 1.7905138339920774 + - type: precision_at_1000 + value: 0.2537549407114581 + - type: recall_at_1 + value: 29.947644735902585 + - type: recall_at_3 + value: 43.95135576014935 + - type: recall_at_5 + value: 51.33413524177249 + - type: recall_at_10 + value: 58.597439631665615 + - type: recall_at_100 + value: 85.04925879936505 + - type: recall_at_1000 + value: 96.93189262162947 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackWordpressRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 26.247689463955638 + - type: ndcg_at_3 + value: 33.25421096011386 + - type: ndcg_at_5 + value: 35.274958043979055 + - type: ndcg_at_10 + value: 37.895337114228504 + - type: ndcg_at_100 + value: 43.16359215810417 + - type: ndcg_at_1000 + value: 45.46544464874392 + - type: map_at_1 + value: 23.730646155069266 + - type: map_at_3 + value: 30.328510192859376 + - type: map_at_5 + value: 31.646131881091033 + - type: map_at_10 + value: 32.834529811633146 + - type: map_at_100 + value: 33.887475191512124 + - type: map_at_1000 + value: 33.98635376333761 + - type: precision_at_1 + value: 26.247689463955638 + - type: precision_at_3 + value: 14.417744916820693 + - type: precision_at_5 + value: 10.018484288354932 + - type: precision_at_10 + value: 6.00739371534199 + - type: precision_at_100 + value: 0.9426987060998051 + - type: precision_at_1000 + value: 0.12476894639556387 + - type: recall_at_1 + value: 23.730646155069266 + - type: recall_at_3 + value: 38.561206845149364 + - type: recall_at_5 + value: 43.38560610577783 + - type: recall_at_10 + value: 51.21370222407728 + - type: recall_at_100 + value: 75.61661144095109 + - type: recall_at_1000 + value: 92.54472715089256 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackProgrammersRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 38.81278538812785 + - type: ndcg_at_3 + value: 43.78338523503654 + - type: ndcg_at_5 + value: 47.097296563014325 + - type: ndcg_at_10 + value: 50.282579667519435 + - type: ndcg_at_100 + value: 55.729033960190286 + - type: ndcg_at_1000 + value: 57.33724814332862 + - type: map_at_1 + value: 31.69764033847938 + - type: map_at_3 + value: 39.42951244122387 + - type: map_at_5 + value: 41.943723140417774 + - type: map_at_10 + value: 43.61013816936983 + - type: map_at_100 + value: 45.02590557151775 + - type: map_at_1000 + value: 45.125950171245066 + - type: precision_at_1 + value: 38.81278538812785 + - type: precision_at_3 + value: 20.96651445966523 + - type: precision_at_5 + value: 15.388127853881361 + - type: precision_at_10 + value: 9.474885844748805 + - type: precision_at_100 + value: 1.400684931506831 + - type: precision_at_1000 + value: 0.17191780821917388 + - type: recall_at_1 + value: 31.69764033847938 + - type: recall_at_3 + value: 46.60687843152849 + - type: recall_at_5 + value: 55.17297638322793 + - type: recall_at_10 + value: 64.45674471217188 + - type: recall_at_100 + value: 87.1937426751484 + - type: recall_at_1000 + value: 97.32787875629423 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackEnglishRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 45.22292993630573 + - type: ndcg_at_3 + value: 50.48933696278536 + - type: ndcg_at_5 + value: 52.51230339563936 + - type: ndcg_at_10 + value: 54.63834990956019 + - type: ndcg_at_100 + value: 58.4908966688059 + - type: ndcg_at_1000 + value: 60.25262455573039 + - type: map_at_1 + value: 36.14176917496391 + - type: map_at_3 + value: 45.293425362542706 + - type: map_at_5 + value: 47.228727919799 + - type: map_at_10 + value: 48.603664692804365 + - type: map_at_100 + value: 49.87291685915334 + - type: map_at_1000 + value: 49.99758620164822 + - type: precision_at_1 + value: 45.22292993630573 + - type: precision_at_3 + value: 24.607218683651517 + - type: precision_at_5 + value: 17.273885350318157 + - type: precision_at_10 + value: 10.401273885350104 + - type: precision_at_100 + value: 1.5840764331210677 + - type: precision_at_1000 + value: 0.20216560509553294 + - type: recall_at_1 + value: 36.14176917496391 + - type: recall_at_3 + value: 52.458133860965276 + - type: recall_at_5 + value: 58.30933220798927 + - type: recall_at_10 + value: 64.76267431694271 + - type: recall_at_100 + value: 81.11863633256955 + - type: recall_at_1000 + value: 91.95898877878803 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackMathematicaRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 25.37313432835821 + - type: ndcg_at_3 + value: 31.513955857649872 + - type: ndcg_at_5 + value: 33.894814999901286 + - type: ndcg_at_10 + value: 36.567795091777775 + - type: ndcg_at_100 + value: 42.692861355185926 + - type: ndcg_at_1000 + value: 45.1650634517594 + - type: map_at_1 + value: 20.137260127931768 + - type: map_at_3 + value: 27.513893824528164 + - type: map_at_5 + value: 29.228223959567245 + - type: map_at_10 + value: 30.486342453382235 + - type: map_at_100 + value: 31.93773531700923 + - type: map_at_1000 + value: 32.045221355885026 + - type: precision_at_1 + value: 25.37313432835821 + - type: precision_at_3 + value: 15.713101160862273 + - type: precision_at_5 + value: 11.218905472636896 + - type: precision_at_10 + value: 6.828358208955276 + - type: precision_at_100 + value: 1.1318407960198864 + - type: precision_at_1000 + value: 0.14776119402984852 + - type: recall_at_1 + value: 20.137260127931768 + - type: recall_at_3 + value: 35.516761430940534 + - type: recall_at_5 + value: 41.81044183842692 + - type: recall_at_10 + value: 49.84812658320122 + - type: recall_at_100 + value: 75.52224965471233 + - type: recall_at_1000 + value: 93.00114617278797 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackGamingRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 52.10031347962383 + - type: ndcg_at_3 + value: 59.09283306711919 + - type: ndcg_at_5 + value: 61.70364710499664 + - type: ndcg_at_10 + value: 64.43508234673456 + - type: ndcg_at_100 + value: 68.08258162359128 + - type: ndcg_at_1000 + value: 68.78220525177915 + - type: map_at_1 + value: 45.67593534991653 + - type: map_at_3 + value: 55.17968153498597 + - type: map_at_5 + value: 57.073161405223026 + - type: map_at_10 + value: 58.55427425972989 + - type: map_at_100 + value: 59.58877825514076 + - type: map_at_1000 + value: 59.62753156251917 + - type: precision_at_1 + value: 52.10031347962383 + - type: precision_at_3 + value: 25.95611285266423 + - type: precision_at_5 + value: 17.667711598745708 + - type: precision_at_10 + value: 10.169278996864973 + - type: precision_at_100 + value: 1.2852664576802733 + - type: precision_at_1000 + value: 0.13786833855798794 + - type: recall_at_1 + value: 45.67593534991653 + - type: recall_at_3 + value: 63.87786043907147 + - type: recall_at_5 + value: 70.25761057674107 + - type: recall_at_10 + value: 77.97283230161469 + - type: recall_at_100 + value: 93.12900411473255 + - type: recall_at_1000 + value: 97.98040752351098 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackGisRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 29.830508474576273 + - type: ndcg_at_3 + value: 36.43753958419226 + - type: ndcg_at_5 + value: 39.55362935996899 + - type: ndcg_at_10 + value: 43.11482816486947 + - type: ndcg_at_100 + value: 48.55701741086406 + - type: ndcg_at_1000 + value: 50.12437449225312 + - type: map_at_1 + value: 27.58676351896691 + - type: map_at_3 + value: 33.9831853645413 + - type: map_at_5 + value: 35.81743341404356 + - type: map_at_10 + value: 37.38087764923922 + - type: map_at_100 + value: 38.54334689204219 + - type: map_at_1000 + value: 38.60999368829795 + - type: precision_at_1 + value: 29.830508474576273 + - type: precision_at_3 + value: 15.21657250470804 + - type: precision_at_5 + value: 10.960451977401222 + - type: precision_at_10 + value: 6.779661016949213 + - type: precision_at_100 + value: 0.9977401129943356 + - type: precision_at_1000 + value: 0.11661016949152515 + - type: recall_at_1 + value: 27.58676351896691 + - type: recall_at_3 + value: 41.050040355125105 + - type: recall_at_5 + value: 48.356201237557165 + - type: recall_at_10 + value: 58.86871132633844 + - type: recall_at_100 + value: 83.44115081403217 + - type: recall_at_1000 + value: 95.14032985219426 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackUnixRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 37.77985074626866 + - type: ndcg_at_3 + value: 42.68906535122145 + - type: ndcg_at_5 + value: 45.42572671347988 + - type: ndcg_at_10 + value: 48.503281334563006 + - type: ndcg_at_100 + value: 53.90759554634032 + - type: ndcg_at_1000 + value: 55.6750143459022 + - type: map_at_1 + value: 32.05179459843639 + - type: map_at_3 + value: 39.174397111663886 + - type: map_at_5 + value: 41.09602758395897 + - type: map_at_10 + value: 42.57548284992813 + - type: map_at_100 + value: 43.88590856115191 + - type: map_at_1000 + value: 43.97573928697477 + - type: precision_at_1 + value: 37.77985074626866 + - type: precision_at_3 + value: 19.40298507462699 + - type: precision_at_5 + value: 13.768656716417915 + - type: precision_at_10 + value: 8.330223880596947 + - type: precision_at_100 + value: 1.2266791044775944 + - type: precision_at_1000 + value: 0.14860074626865238 + - type: recall_at_1 + value: 32.05179459843639 + - type: recall_at_3 + value: 46.19290082326463 + - type: recall_at_5 + value: 53.065248391740916 + - type: recall_at_10 + value: 61.95742612487016 + - type: recall_at_100 + value: 84.95720140659506 + - type: recall_at_1000 + value: 96.7945875641771 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackTexRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 24.36338609772884 + - type: ndcg_at_3 + value: 29.344263505458546 + - type: ndcg_at_5 + value: 31.648927411353355 + - type: ndcg_at_10 + value: 34.37718834167528 + - type: ndcg_at_100 + value: 39.988489670143565 + - type: ndcg_at_1000 + value: 42.59253219178224 + - type: map_at_1 + value: 20.102111701568827 + - type: map_at_3 + value: 26.034827870203504 + - type: map_at_5 + value: 27.635335063884625 + - type: map_at_10 + value: 28.955304300478456 + - type: map_at_100 + value: 30.17348927054766 + - type: map_at_1000 + value: 30.29821812881463 + - type: precision_at_1 + value: 24.36338609772884 + - type: precision_at_3 + value: 13.971094287680497 + - type: precision_at_5 + value: 10.178940123881386 + - type: precision_at_10 + value: 6.362697866482958 + - type: precision_at_100 + value: 1.0784583620096873 + - type: precision_at_1000 + value: 0.14810736407432443 + - type: recall_at_1 + value: 20.102111701568827 + - type: recall_at_3 + value: 32.51720798237882 + - type: recall_at_5 + value: 38.47052010632308 + - type: recall_at_10 + value: 46.560251311326375 + - type: recall_at_100 + value: 71.37281646052087 + - type: recall_at_1000 + value: 89.54176274473149 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackAndroidRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 44.34907010014306 + - type: ndcg_at_3 + value: 50.3866971503038 + - type: ndcg_at_5 + value: 53.15366139760711 + - type: ndcg_at_10 + value: 56.56459368482132 + - type: ndcg_at_100 + value: 61.49499162448754 + - type: ndcg_at_1000 + value: 62.750952246569824 + - type: map_at_1 + value: 35.87684730898816 + - type: map_at_3 + value: 44.81019864282626 + - type: map_at_5 + value: 47.24254516428158 + - type: map_at_10 + value: 49.28704567095768 + - type: map_at_100 + value: 50.85906250580416 + - type: map_at_1000 + value: 50.96818352379094 + - type: precision_at_1 + value: 44.34907010014306 + - type: precision_at_3 + value: 24.463519313304776 + - type: precision_at_5 + value: 17.68240343347653 + - type: precision_at_10 + value: 11.173104434906978 + - type: precision_at_100 + value: 1.7095851216022702 + - type: precision_at_1000 + value: 0.21087267525035264 + - type: recall_at_1 + value: 35.87684730898816 + - type: recall_at_3 + value: 52.8360317975774 + - type: recall_at_5 + value: 60.826717819116716 + - type: recall_at_10 + value: 70.64783984145798 + - type: recall_at_100 + value: 90.90247835876467 + - type: recall_at_1000 + value: 98.27352916110131 + - task: + type: Retrieval + dataset: + type: BeIR/cqadupstack + name: MTEB CQADupstackRetrieval + config: default + split: test + revision: None + metrics: + - type: ndcg_at_1 + value: 36.038578730542476 + - type: ndcg_at_3 + value: 41.931365356453036 + - type: ndcg_at_5 + value: 44.479015523894994 + - type: ndcg_at_10 + value: 47.308084499970704 + - type: ndcg_at_100 + value: 52.498062430513606 + - type: ndcg_at_1000 + value: 54.2908789514719 + - type: map_at_1 + value: 30.38821701528966 + - type: map_at_3 + value: 37.974871761903636 + - type: map_at_5 + value: 39.85399878507757 + - type: map_at_10 + value: 41.31456611036795 + - type: map_at_100 + value: 42.62907836655835 + - type: map_at_1000 + value: 42.737235870659845 + - type: precision_at_1 + value: 36.038578730542476 + - type: precision_at_3 + value: 19.39960180094633 + - type: precision_at_5 + value: 13.79264655952497 + - type: precision_at_10 + value: 8.399223517333388 + - type: precision_at_100 + value: 1.2992373779520896 + - type: precision_at_1000 + value: 0.16327170951909567 + - type: recall_at_1 + value: 30.38821701528966 + - type: recall_at_3 + value: 45.51645512564165 + - type: recall_at_5 + value: 52.06077167834868 + - type: recall_at_10 + value: 60.38864106788279 + - type: recall_at_100 + value: 82.76968509918343 + - type: recall_at_1000 + value: 94.84170217080344 +--- + + +

FlagEmbedding

+ + +

+

+ Model List | + FAQ | + Usage | + Evaluation | + Train | + Contact | + Citation | + License +

+

+ +For more details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding). + +If you are looking for a model with rich semantic expression capabilities, consider choosing **BGE-EN-Mistral**. It combines the ability of in-context learning with the strengths of large models and dense retrieval, achieving outstanding results. + +**BGE-EN-Mistral** primarily demonstrates the following capabilities: +- In-context learning ability: By providing few-shot examples in the query, it can significantly enhance the model's ability to handle new tasks. +- Outstanding performance: The model has achieved state-of-the-art (SOTA) performance on both BEIR and AIR-Bench. + +We will release a technical report about **BGE-EN-Mistral** soon with more details. + +[English](README.md) | [中文](https://github.com/FlagOpen/FlagEmbedding/blob/master/README_zh.md) + +FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: + +- **LLM-based Dense Retrieval**: BGE-EN-Mistral, BGE-Multilingual-Gemma2 +- **Long-Context LLM**: [Activation Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon) +- **Fine-tuning of LM** : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail) +- **Dense Retrieval**: [BGE-M3](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3), [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding) +- **Reranker Model**: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker) +- **Benchmark**: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) + +## News +- 7/26/2024: Release **BGE-En-Mistral**, a Mistral-7B based dense retriever, by integrating in-context learning abilities into the embedding model, new state-of-the-art results have been achieved on both the MTEB and AIR-Benchmark. +- 1/30/2024: Release **BGE-M3**, a new member to BGE model series! M3 stands for **M**ulti-linguality (100+ languages), **M**ulti-granularities (input length up to 8192), **M**ulti-Functionality (unification of dense, lexical, multi-vec/colbert retrieval). +It is the first embedding model that supports all three retrieval methods, achieving new SOTA on multi-lingual (MIRACL) and cross-lingual (MKQA) benchmarks. +[Technical Report](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/BGE_M3/BGE_M3.pdf) and [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3). :fire: +- 1/9/2024: Release [Activation-Beacon](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon), an effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM. [Technical Report](https://arxiv.org/abs/2401.03462) :fire: +- 12/24/2023: Release **LLaRA**, a LLaMA-7B based dense retriever, leading to state-of-the-art performances on MS MARCO and BEIR. Model and code will be open-sourced. Please stay tuned. [Technical Report](https://arxiv.org/abs/2312.15503) :fire: +- 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail), a method to maintain general capabilities during fine-tuning by merging multiple language models. [Technical Report](https://arxiv.org/abs/2311.13534) :fire: +- 10/12/2023: Release [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), a unified embedding model to support diverse retrieval augmentation needs for LLMs. [Technical Report](https://arxiv.org/pdf/2310.07554.pdf) +- 09/15/2023: The [technical report](https://arxiv.org/pdf/2309.07597.pdf) and [massive training data](https://data.baai.ac.cn/details/BAAI-MTP) of BGE has been released +- 09/12/2023: New models: + - **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models. + - **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction. + + +
+ More + + +- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning. +- 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard). +- 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗** +- 08/02/2023: Release `bge-large-*`(short for BAAI General Embedding) Models, **rank 1st on MTEB and C-MTEB benchmark!** :tada: :tada: +- 08/01/2023: We release the [Chinese Massive Text Embedding Benchmark](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), consisting of 31 test dataset. + + +
+ + +## Model List + +`bge` is short for `BAAI general embedding`. + +| Model | Language | | Description | query instruction for retrieval [1] | +|:--------------------------------------------------------------------------|:-------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:| +| [BAAI/bge-en-mistral](https://huggingface.co/BAAI/bge-en-mistral) | English | - | A LLM-based dense retriever with in-context learning capabilities can fully leverage the model's potential based on a few shot examples(4096 tokens) | Provide instructions and few-shot examples freely based on the given task. | +| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | Multilingual | [Inference](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3#usage) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) | Multi-Functionality(dense retrieval, sparse retrieval, multi-vector(colbert)), Multi-Linguality, and Multi-Granularity(8192 tokens) | | +| [BAAI/llm-embedder](https://huggingface.co/BAAI/llm-embedder) | English | [Inference](./FlagEmbedding/llm_embedder/README.md) [Fine-tune](./FlagEmbedding/llm_embedder/README.md) | a unified embedding model to support diverse retrieval augmentation needs for LLMs | See [README](./FlagEmbedding/llm_embedder/README.md) | +| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | | +| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) | Chinese and English | [Inference](#usage-for-reranker) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/reranker) | a cross-encoder model which is more accurate but less efficient [2] | | +| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-large-zh-v1.5](https://huggingface.co/BAAI/bge-large-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` | +| [BAAI/bge-base-zh-v1.5](https://huggingface.co/BAAI/bge-base-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` | +| [BAAI/bge-small-zh-v1.5](https://huggingface.co/BAAI/bge-small-zh-v1.5) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | version 1.5 with more reasonable similarity distribution | `为这个句子生成表示以用于检索相关文章:` | +| [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [MTEB](https://huggingface.co/spaces/mteb/leaderboard) leaderboard | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-en` | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-small-en](https://huggingface.co/BAAI/bge-small-en) | English | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `Represent this sentence for searching relevant passages: ` | +| [BAAI/bge-large-zh](https://huggingface.co/BAAI/bge-large-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | :trophy: rank **1st** in [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) benchmark | `为这个句子生成表示以用于检索相关文章:` | +| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a base-scale model but with similar ability to `bge-large-zh` | `为这个句子生成表示以用于检索相关文章:` | +| [BAAI/bge-small-zh](https://huggingface.co/BAAI/bge-small-zh) | Chinese | [Inference](#usage-for-embedding-model) [Fine-tune](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) | a small-scale model but with competitive performance | `为这个句子生成表示以用于检索相关文章:` | + +[1\]: If you need to search the relevant passages to a query, we suggest to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, **no instruction** needs to be added to passages. + +[2\]: Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. To balance the accuracy and time cost, cross-encoder is widely used to re-rank top-k documents retrieved by other simple models. +For examples, use bge embedding model to retrieve top 100 relevant documents, and then use bge reranker to re-rank the top 100 document to get the final top-3 results. + +All models have been uploaded to Huggingface Hub, and you can see them at https://huggingface.co/BAAI. +If you cannot open the Huggingface Hub, you also can download the models at https://model.baai.ac.cn/models . + + +## Usage + +### Usage for Embedding Model + +Here are some examples for using `bge-en-mistral` model with [FlagEmbedding](#using-flagembedding) or [Huggingface Transformers](#using-huggingface-transformers). + +#### Using FlagEmbedding +``` +pip install -U FlagEmbedding +``` +If it doesn't work for you, you can see [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md) for more methods to install FlagEmbedding. + +```python +from FlagEmbedding import FlagICLModel +queries = ["how much protein should a female eat", "summit define"] +documents = [ + "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", + "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments." +] +examples = [ + {'instruct': 'Given a web search query, retrieve relevant passages that answer the query.', + 'query': 'what is a virtual interface', + 'response': "A virtual interface is a software-defined abstraction that mimics the behavior and characteristics of a physical network interface. It allows multiple logical network connections to share the same physical network interface, enabling efficient utilization of network resources. Virtual interfaces are commonly used in virtualization technologies such as virtual machines and containers to provide network connectivity without requiring dedicated hardware. They facilitate flexible network configurations and help in isolating network traffic for security and management purposes."}, + {'instruct': 'Given a web search query, retrieve relevant passages that answer the query.', + 'query': 'causes of back pain in female for a week', + 'response': "Back pain in females lasting a week can stem from various factors. Common causes include muscle strain due to lifting heavy objects or improper posture, spinal issues like herniated discs or osteoporosis, menstrual cramps causing referred pain, urinary tract infections, or pelvic inflammatory disease. Pregnancy-related changes can also contribute. Stress and lack of physical activity may exacerbate symptoms. Proper diagnosis by a healthcare professional is crucial for effective treatment and management."} +] +model = FlagICLModel('BAAI/bge-en-mistral', + query_instruction_for_retrieval="Given a web search query, retrieve relevant passages that answer the query.", + examples_for_task=examples, + use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation +embeddings_1 = model.encode_queries(queries) +embeddings_2 = model.encode_corpus(documents) +similarity = embeddings_1 @ embeddings_2.T +print(similarity) +``` +For the value of the argument `query_instruction_for_retrieval`, You can see [e5-mistral-7b](https://huggingface.co/intfloat/e5-mistral-7b-instruct), but we append `.` at the end of each instruct. + +By default, FlagICLModel will use all available GPUs when encoding. Please set `os.environ["CUDA_VISIBLE_DEVICES"]` to select specific GPUs. +You also can set `os.environ["CUDA_VISIBLE_DEVICES"]=""` to make all GPUs unavailable. + + +#### Using HuggingFace Transformers + +With the transformers package, you can use the model like this: First, you pass your input through the transformer model, then you select the last hidden state of the first token (i.e., [CLS]) as the sentence embedding. + +```python +import torch +import torch.nn.functional as F + +from torch import Tensor +from transformers import AutoTokenizer, AutoModel + + +def last_token_pool(last_hidden_states: Tensor, + attention_mask: Tensor) -> Tensor: + left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) + if left_padding: + return last_hidden_states[:, -1] + else: + sequence_lengths = attention_mask.sum(dim=1) - 1 + batch_size = last_hidden_states.shape[0] + return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] + + +def get_detailed_instruct(task_description: str, query: str) -> str: + return f'{task_description}\n{query}\n' + +def get_detailed_example(task_description: str, query: str, response: str) -> str: + return f'{task_description}\n{query}\n{response}' + +task = 'Given a web search query, retrieve relevant passages that answer the query.' +examples = [ + {'instruct': 'Given a web search query, retrieve relevant passages that answer the query.', + 'query': 'what is a virtual interface', + 'response': "A virtual interface is a software-defined abstraction that mimics the behavior and characteristics of a physical network interface. It allows multiple logical network connections to share the same physical network interface, enabling efficient utilization of network resources. Virtual interfaces are commonly used in virtualization technologies such as virtual machines and containers to provide network connectivity without requiring dedicated hardware. They facilitate flexible network configurations and help in isolating network traffic for security and management purposes."}, + {'instruct': 'Given a web search query, retrieve relevant passages that answer the query.', + 'query': 'causes of back pain in female for a week', + 'response': "Back pain in females lasting a week can stem from various factors. Common causes include muscle strain due to lifting heavy objects or improper posture, spinal issues like herniated discs or osteoporosis, menstrual cramps causing referred pain, urinary tract infections, or pelvic inflammatory disease. Pregnancy-related changes can also contribute. Stress and lack of physical activity may exacerbate symptoms. Proper diagnosis by a healthcare professional is crucial for effective treatment and management."} +] +examples = [get_detailed_example(e['instruct'], e['query'], e['response']) for e in examples] +examples_prefix = '\n\n'.join(examples) + '\n\n' +queries = [ + examples_prefix + get_detailed_instruct(task, 'how much protein should a female eat'), + examples_prefix + get_detailed_instruct(task, 'summit define') +] +# No need to add instruction for retrieval documents +documents = [ + "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", + "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments." +] +input_texts = queries + documents + +tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-en-mistral') +model = AutoModel.from_pretrained('BAAI/bge-en-mistral') +model.eval() + +max_length = 4096 +# Tokenize the input texts +batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt') + +with torch.no_grad(): + outputs = model(**batch_dict) + embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) + +# normalize embeddings +embeddings = F.normalize(embeddings, p=2, dim=1) +scores = (embeddings[:2] @ embeddings[2:].T) * 100 +print(scores.tolist()) +``` + + +### Usage for Reranker + +Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. +You can get a relevance score by inputting query and passage to the reranker. +The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. + + +#### Using FlagEmbedding +``` +pip install -U FlagEmbedding +``` + +Get relevance scores (higher scores indicate more relevance): +```python +from FlagEmbedding import FlagReranker +reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation + +score = reranker.compute_score(['query', 'passage']) +print(score) + +scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]) +print(scores) +``` + + +#### Using Huggingface transformers + +```python +import torch +from transformers import AutoModelForSequenceClassification, AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large') +model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-large') +model.eval() + +pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] +with torch.no_grad(): + inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512) + scores = model(**inputs, return_dict=True).logits.view(-1, ).float() + print(scores) +``` + +## Evaluation + +`bge-en-mistral` achieve **state-of-the-art performance on both MTEB and Air-Bench leaderboard!** +For more details and evaluation tools see our [scripts](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB/README.md). + +- **MTEB**: + +| MTEB | STS (10) | Summarization (1) | Pair Classification (3) | Classification (12) | Reranking (4) | Clustering (11) | Retrieval (15) | ALL (56) | +|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +| **e5-mistral-7b-instruct** | 84.62 | 31.40 | 88.37 | 78.48 | 60.20 | 50.26 | 56.89 | 66.60 | +| **SFR-Embedding-Mistral** | **85.05** | 31.16 | **88.54** | 78.33 | 60.64 | 51.67 | 59.03 | 67.56 | +| **NV-Embed-v1** | 82.84 | 31.20 | 86.91 | 87.35 | 60.54 | 52.80 | 59.36 | 69.32 | +| **Linq-Embed-Mistral** | 84.97 | 31.00 | 88.35 | 80.16 | 60.29 | 51.42 | 60.19 | 68.17 | +| **SFR-Embedding-2_R** | 81.26 | 30.71 | 88.07 | 89.05 | 60.14 | 56.17 | 60.18 | 70.31 | +| **gte-Qwen2-7B-instruct** | 83.04 | 31.35 | 85.79 | 86.58 | **61.42** | 56.92 | 60.25 | 70.24 | +| **stella_en_1.5B_v5** | 84.51 | **31.49** | 88.07 | 88.07 | 61.21 | 57.69 | 61.21 | 71.19 | +| **bge-multilingual-gemma2** | 83.88 | 31.20 | 85.84 | 88.08 | 59.72 | 54.65 | 59.24 | 69.88 | +| **bge-en-mistral zero-shot** | 83.74 | 30.75 | 87.21 | 88.66 | 59.66 | 57.57 | 61.67 | 71.26 | +| **bge-en-mistral few-shot** | 84.25 | 30.77 | 88.38 | **88.99** | 59.82 | **57.89** | **62.16** | **71.69** | + +- **BEIR**: + +| BEIR | e5-mistral-7b-instruct | SFR-Embedding-Mistral | NV-Embed-v1 | Linq-Embed-Mistral | SFR-Embedding-2_R | gte-Qwen2-7B-instruct | stella_en _1.5B_v5 | bge-multilingual-gemma2 | bge-en-mistral zero-shot | bge-en-mistral few-shot | +| :----------------: | :--------------------: | :-------------------: | :---------: | :----------------: | :---------------: | :-------------------: | :----------------: | :---------------------: | :----------------------: | :---------------------: | +| **ArguAna** | 61.9 | 67.27 | 68.21 | 69.65 | 62.34 | 64.27 | 65.27 | 77.37 | 82.76 | **83.08** | +| **ClimateFEVER** | 38.4 | 36.41 | 34.72 | 39.11 | 34.43 | **45.88** | 46.11 | 39.37 | 45.35 | 45.43 | +| **CQA** | 43 | 46.54 | **50.51** | 47.27 | 46.11 | 46.43 | 47.75 | 47.94 | 47.23 | 47.31 | +| **DBPedia** | 48.9 | 49.06 | 48.29 | 51.32 | 51.21 | **52.42** | 52.28 | 51.37 | 50.42 | 51.63 | +| **FEVER** | 87.8 | 89.35 | 87.77 | 92.42 | 92.16 | **95.11** | 94.83 | 90.38 | 91.96 | 92.83 | +| **FiQA2018** | 56.6 | 60.55 | **63.1** | 61.2 | 61.77 | 62.03 | 60.48 | 60.04 | 58.77 | 59.67 | +| **HotpotQA** | 75.7 | 77.02 | 79.92 | 76.24 | 81.36 | 73.08 | 76.67 | 83.26 | 84.98 | **85.14** | +| **MSMARCO** | 43.1 | 43.41 | 46.49 | 45.21 | 42.18 | 45.98 | 45.22 | 45.71 | 46.72 | **46.79** | +| **NFCorpus** | 38.6 | 42.02 | 38.04 | 41.62 | 41.34 | 40.6 | **42** | 38.11 | 40.69 | 41.85 | +| **NQ** | 63.5 | 69.92 | 71.22 | 70.63 | 73.96 | 67 | 71.8 | 71.45 | 73.85 | **73.88** | +| **QuoraRetrieval** | 89.6 | 89.81 | 89.21 | 90.27 | 89.58 | 90.09 | 90.03 | 90.04 | 91.02 | **90.95** | +| **SCIDOCS** | 16.3 | 19.91 | 20.19 | 21.93 | 24.87 | **28.91** | 26.64 | 26.93 | 25.25 | 25.26 | +| **SciFact** | 76.4 | 78.06 | 78.43 | 78.32 | **85.91** | 79.06 | 80.09 | 72.05 | 78.33 | 79.09 | +| **Touche2020** | 26.4 | 29 | 28.38 | **30.61** | 28.18 | 30.57 | 29.94 | 30.26 | 29.67 | 30.48 | +| **TRECCOVID** | 87.2 | 87.1 | 85.88 | 87.1 | **87.28** | 82.26 | 85.98 | 64.27 | 78.11 | 79.08 | +| **Mean** | 56.89 | 59.03 | 59.36 | 60.19 | 60.18 | 60.25 | 61.21 | 59.24 | 61.67 | **62.16** | + +- **Air-Bench**: + +**QA (en, nDCG@10):** + +| AIR-Bench_24.04 | wiki | web | news | healthcare | law | finance | arxiv | msmarco | ALL (8) | +| :--------------------------: | :-------: | :-------: | :-------: | :--------: | :-------: | :-------: | :-------: | :-------: | :-------: | +| **e5-mistral-7b-instruct** | 61.67 | 44.41 | 48.18 | 56.32 | 19.32 | 54.79 | 44.78 | 59.03 | 48.56 | +| **SFR-Embedding-Mistral** | 63.46 | 51.27 | 52.21 | 58.76 | 23.27 | 56.94 | 47.75 | 58.99 | 51.58 | +| **NV-Embed-v1** | 62.84 | 50.42 | 51.46 | 58.53 | 20.65 | 49.89 | 46.10 | 60.27 | 50.02 | +| **Linq-Embed-Mistral** | 61.04 | 48.41 | 49.44 | **60.18** | 20.34 | 50.04 | 47.56 | 60.50 | 49.69 | +| **gte-Qwen2-7B-instruct** | 63.46 | 51.20 | 54.07 | 54.20 | 22.31 | **58.20** | 40.27 | 58.39 | 50.26 | +| **stella_en_1.5B_v5** | 61.99 | 50.88 | 53.87 | 58.81 | 23.22 | 57.26 | 44.81 | 61.38 | 51.53 | +| **bge-en-mistral zero-shot** | 64.61 | 54.40 | 55.11 | 57.25 | 25.10 | 54.81 | 48.46 | 63.71 | 52.93 | +| **bge-en-mistral few-shot** | **64.94** | **55.11** | **56.02** | 58.85 | **28.29** | 57.16 | **50.04** | **64.50** | **54.36** | + +**Long-Doc (en, Recall@10):** + +| AIR-Bench_24.04 | arxiv (4) | book (2) | healthcare (5) | law (4) | ALL (15) | +| :--------------------------: | :-------: | :-------: | :------------: | :-------: | :-------: | +| **text-embedding-3-large** | 74.53 | 73.16 | 65.83 | 64.47 | 68.77 | +| **e5-mistral-7b-instruct** | 72.14 | 72.44 | 68.44 | 62.92 | 68.49 | +| **SFR-Embedding-Mistral** | 72.79 | 72.41 | 67.94 | 64.83 | 69.00 | +| **NV-Embed-v1** | 77.65 | 75.49 | 72.38 | **69.55** | 73.45 | +| **Linq-Embed-Mistral** | 75.46 | 73.81 | 71.58 | 68.58 | 72.11 | +| **gte-Qwen2-7B-instruct** | 63.93 | 68.51 | 65.59 | 65.26 | 65.45 | +| **stella_en_1.5B_v5** | 73.17 | 74.38 | 70.02 | 69.32 | 71.25 | +| **bge-en-mistral zero-shot** | 78.30 | 78.21 | 73.65 | 67.09 | 73.75 | +| **bge-en-mistral few-shot** | **79.63** | **79.36** | **74.80** | 67.79 | **74.83** | + + + +## Contact + +If you have any question or suggestion related to this project, feel free to open an issue or pull request. +You also can email Shitao Xiao(stxiao@baai.ac.cn) and Zheng Liu(liuzheng@baai.ac.cn). + + +## Citation + +If you find this repository useful, please consider giving a star :star: and citation + +``` +@misc{bge_embedding, + title={C-Pack: Packaged Resources To Advance General Chinese Embedding}, + author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff}, + year={2023}, + eprint={2309.07597}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} +``` + +## License +FlagEmbedding is licensed under the [MIT License](https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE). The released models can be used for commercial purposes free of charge. +