Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Meta-Llama-3-unsupervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 75.70149253731343 | |
- type: ap | |
value: 40.824269118508354 | |
- type: f1 | |
value: 70.55918234479084 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 80.6812 | |
- type: ap | |
value: 76.63327889516552 | |
- type: f1 | |
value: 80.5276613226382 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 40.002 | |
- type: f1 | |
value: 39.67277678335084 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.173999999999996 | |
- type: map_at_10 | |
value: 42.548 | |
- type: map_at_100 | |
value: 43.492999999999995 | |
- type: map_at_1000 | |
value: 43.5 | |
- type: map_at_3 | |
value: 37.376 | |
- type: map_at_5 | |
value: 40.359 | |
- type: mrr_at_1 | |
value: 27.24 | |
- type: mrr_at_10 | |
value: 42.945 | |
- type: mrr_at_100 | |
value: 43.89 | |
- type: mrr_at_1000 | |
value: 43.897000000000006 | |
- type: mrr_at_3 | |
value: 37.779 | |
- type: mrr_at_5 | |
value: 40.755 | |
- type: ndcg_at_1 | |
value: 26.173999999999996 | |
- type: ndcg_at_10 | |
value: 51.731 | |
- type: ndcg_at_100 | |
value: 55.684999999999995 | |
- type: ndcg_at_1000 | |
value: 55.86 | |
- type: ndcg_at_3 | |
value: 41.122 | |
- type: ndcg_at_5 | |
value: 46.491 | |
- type: precision_at_1 | |
value: 26.173999999999996 | |
- type: precision_at_10 | |
value: 8.108 | |
- type: precision_at_100 | |
value: 0.9820000000000001 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 17.330000000000002 | |
- type: precision_at_5 | |
value: 13.001 | |
- type: recall_at_1 | |
value: 26.173999999999996 | |
- type: recall_at_10 | |
value: 81.081 | |
- type: recall_at_100 | |
value: 98.222 | |
- type: recall_at_1000 | |
value: 99.57300000000001 | |
- type: recall_at_3 | |
value: 51.991 | |
- type: recall_at_5 | |
value: 65.007 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 49.215974795578546 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 41.71067780141813 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 57.15639347603191 | |
- type: mrr | |
value: 71.4509959108297 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 84.67361609277127 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 84.76623376623375 | |
- type: f1 | |
value: 84.70041172334481 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 38.39251163108548 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 31.30501371807517 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.409 | |
- type: map_at_10 | |
value: 36.925000000000004 | |
- type: map_at_100 | |
value: 38.651 | |
- type: map_at_1000 | |
value: 38.798 | |
- type: map_at_3 | |
value: 33.437 | |
- type: map_at_5 | |
value: 35.506 | |
- type: mrr_at_1 | |
value: 33.763 | |
- type: mrr_at_10 | |
value: 43.442 | |
- type: mrr_at_100 | |
value: 44.339 | |
- type: mrr_at_1000 | |
value: 44.391000000000005 | |
- type: mrr_at_3 | |
value: 40.749 | |
- type: mrr_at_5 | |
value: 42.408 | |
- type: ndcg_at_1 | |
value: 33.763 | |
- type: ndcg_at_10 | |
value: 43.486999999999995 | |
- type: ndcg_at_100 | |
value: 49.71 | |
- type: ndcg_at_1000 | |
value: 51.81 | |
- type: ndcg_at_3 | |
value: 38.586 | |
- type: ndcg_at_5 | |
value: 41.074 | |
- type: precision_at_1 | |
value: 33.763 | |
- type: precision_at_10 | |
value: 8.798 | |
- type: precision_at_100 | |
value: 1.544 | |
- type: precision_at_1000 | |
value: 0.21 | |
- type: precision_at_3 | |
value: 19.361 | |
- type: precision_at_5 | |
value: 14.335 | |
- type: recall_at_1 | |
value: 26.409 | |
- type: recall_at_10 | |
value: 55.352999999999994 | |
- type: recall_at_100 | |
value: 81.66799999999999 | |
- type: recall_at_1000 | |
value: 95.376 | |
- type: recall_at_3 | |
value: 40.304 | |
- type: recall_at_5 | |
value: 47.782000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 26.6 | |
- type: map_at_10 | |
value: 36.42 | |
- type: map_at_100 | |
value: 37.628 | |
- type: map_at_1000 | |
value: 37.767 | |
- type: map_at_3 | |
value: 33.553 | |
- type: map_at_5 | |
value: 35.118 | |
- type: mrr_at_1 | |
value: 34.394999999999996 | |
- type: mrr_at_10 | |
value: 42.586 | |
- type: mrr_at_100 | |
value: 43.251 | |
- type: mrr_at_1000 | |
value: 43.303000000000004 | |
- type: mrr_at_3 | |
value: 40.297 | |
- type: mrr_at_5 | |
value: 41.638 | |
- type: ndcg_at_1 | |
value: 34.394999999999996 | |
- type: ndcg_at_10 | |
value: 42.05 | |
- type: ndcg_at_100 | |
value: 46.371 | |
- type: ndcg_at_1000 | |
value: 48.76 | |
- type: ndcg_at_3 | |
value: 37.936 | |
- type: ndcg_at_5 | |
value: 39.827 | |
- type: precision_at_1 | |
value: 34.394999999999996 | |
- type: precision_at_10 | |
value: 8.268 | |
- type: precision_at_100 | |
value: 1.355 | |
- type: precision_at_1000 | |
value: 0.186 | |
- type: precision_at_3 | |
value: 18.726000000000003 | |
- type: precision_at_5 | |
value: 13.541 | |
- type: recall_at_1 | |
value: 26.6 | |
- type: recall_at_10 | |
value: 51.529 | |
- type: recall_at_100 | |
value: 70.038 | |
- type: recall_at_1000 | |
value: 85.67 | |
- type: recall_at_3 | |
value: 39.448 | |
- type: recall_at_5 | |
value: 44.6 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 31.863000000000003 | |
- type: map_at_10 | |
value: 43.733 | |
- type: map_at_100 | |
value: 45.005 | |
- type: map_at_1000 | |
value: 45.074 | |
- type: map_at_3 | |
value: 40.593 | |
- type: map_at_5 | |
value: 42.272 | |
- type: mrr_at_1 | |
value: 37.555 | |
- type: mrr_at_10 | |
value: 47.532999999999994 | |
- type: mrr_at_100 | |
value: 48.431999999999995 | |
- type: mrr_at_1000 | |
value: 48.47 | |
- type: mrr_at_3 | |
value: 44.901 | |
- type: mrr_at_5 | |
value: 46.274 | |
- type: ndcg_at_1 | |
value: 37.555 | |
- type: ndcg_at_10 | |
value: 49.789 | |
- type: ndcg_at_100 | |
value: 55.059999999999995 | |
- type: ndcg_at_1000 | |
value: 56.434 | |
- type: ndcg_at_3 | |
value: 44.238 | |
- type: ndcg_at_5 | |
value: 46.698 | |
- type: precision_at_1 | |
value: 37.555 | |
- type: precision_at_10 | |
value: 8.257 | |
- type: precision_at_100 | |
value: 1.189 | |
- type: precision_at_1000 | |
value: 0.136 | |
- type: precision_at_3 | |
value: 20.23 | |
- type: precision_at_5 | |
value: 13.868 | |
- type: recall_at_1 | |
value: 31.863000000000003 | |
- type: recall_at_10 | |
value: 64.188 | |
- type: recall_at_100 | |
value: 87.02600000000001 | |
- type: recall_at_1000 | |
value: 96.761 | |
- type: recall_at_3 | |
value: 48.986000000000004 | |
- type: recall_at_5 | |
value: 55.177 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 15.964 | |
- type: map_at_10 | |
value: 22.746 | |
- type: map_at_100 | |
value: 23.704 | |
- type: map_at_1000 | |
value: 23.82 | |
- type: map_at_3 | |
value: 20.5 | |
- type: map_at_5 | |
value: 21.836 | |
- type: mrr_at_1 | |
value: 17.740000000000002 | |
- type: mrr_at_10 | |
value: 24.634 | |
- type: mrr_at_100 | |
value: 25.535999999999998 | |
- type: mrr_at_1000 | |
value: 25.628 | |
- type: mrr_at_3 | |
value: 22.429 | |
- type: mrr_at_5 | |
value: 23.791 | |
- type: ndcg_at_1 | |
value: 17.740000000000002 | |
- type: ndcg_at_10 | |
value: 26.838 | |
- type: ndcg_at_100 | |
value: 31.985000000000003 | |
- type: ndcg_at_1000 | |
value: 35.289 | |
- type: ndcg_at_3 | |
value: 22.384 | |
- type: ndcg_at_5 | |
value: 24.726 | |
- type: precision_at_1 | |
value: 17.740000000000002 | |
- type: precision_at_10 | |
value: 4.35 | |
- type: precision_at_100 | |
value: 0.753 | |
- type: precision_at_1000 | |
value: 0.108 | |
- type: precision_at_3 | |
value: 9.754999999999999 | |
- type: precision_at_5 | |
value: 7.164 | |
- type: recall_at_1 | |
value: 15.964 | |
- type: recall_at_10 | |
value: 37.705 | |
- type: recall_at_100 | |
value: 61.94499999999999 | |
- type: recall_at_1000 | |
value: 87.646 | |
- type: recall_at_3 | |
value: 25.714 | |
- type: recall_at_5 | |
value: 31.402 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 9.221 | |
- type: map_at_10 | |
value: 14.735000000000001 | |
- type: map_at_100 | |
value: 15.778 | |
- type: map_at_1000 | |
value: 15.9 | |
- type: map_at_3 | |
value: 12.791 | |
- type: map_at_5 | |
value: 13.703999999999999 | |
- type: mrr_at_1 | |
value: 12.438 | |
- type: mrr_at_10 | |
value: 18.353 | |
- type: mrr_at_100 | |
value: 19.285 | |
- type: mrr_at_1000 | |
value: 19.375 | |
- type: mrr_at_3 | |
value: 16.439 | |
- type: mrr_at_5 | |
value: 17.352999999999998 | |
- type: ndcg_at_1 | |
value: 12.438 | |
- type: ndcg_at_10 | |
value: 18.703 | |
- type: ndcg_at_100 | |
value: 24.104999999999997 | |
- type: ndcg_at_1000 | |
value: 27.366 | |
- type: ndcg_at_3 | |
value: 15.055 | |
- type: ndcg_at_5 | |
value: 16.42 | |
- type: precision_at_1 | |
value: 12.438 | |
- type: precision_at_10 | |
value: 3.818 | |
- type: precision_at_100 | |
value: 0.77 | |
- type: precision_at_1000 | |
value: 0.11800000000000001 | |
- type: precision_at_3 | |
value: 7.753 | |
- type: precision_at_5 | |
value: 5.622 | |
- type: recall_at_1 | |
value: 9.221 | |
- type: recall_at_10 | |
value: 27.461999999999996 | |
- type: recall_at_100 | |
value: 51.909000000000006 | |
- type: recall_at_1000 | |
value: 75.56 | |
- type: recall_at_3 | |
value: 17.046 | |
- type: recall_at_5 | |
value: 20.766000000000002 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 22.828 | |
- type: map_at_10 | |
value: 33.166000000000004 | |
- type: map_at_100 | |
value: 34.618 | |
- type: map_at_1000 | |
value: 34.744 | |
- type: map_at_3 | |
value: 29.737000000000002 | |
- type: map_at_5 | |
value: 31.541000000000004 | |
- type: mrr_at_1 | |
value: 29.548000000000002 | |
- type: mrr_at_10 | |
value: 38.582 | |
- type: mrr_at_100 | |
value: 39.527 | |
- type: mrr_at_1000 | |
value: 39.577 | |
- type: mrr_at_3 | |
value: 35.884 | |
- type: mrr_at_5 | |
value: 37.413999999999994 | |
- type: ndcg_at_1 | |
value: 29.548000000000002 | |
- type: ndcg_at_10 | |
value: 39.397 | |
- type: ndcg_at_100 | |
value: 45.584 | |
- type: ndcg_at_1000 | |
value: 47.823 | |
- type: ndcg_at_3 | |
value: 33.717000000000006 | |
- type: ndcg_at_5 | |
value: 36.223 | |
- type: precision_at_1 | |
value: 29.548000000000002 | |
- type: precision_at_10 | |
value: 7.767 | |
- type: precision_at_100 | |
value: 1.2959999999999998 | |
- type: precision_at_1000 | |
value: 0.17099999999999999 | |
- type: precision_at_3 | |
value: 16.747 | |
- type: precision_at_5 | |
value: 12.203999999999999 | |
- type: recall_at_1 | |
value: 22.828 | |
- type: recall_at_10 | |
value: 52.583999999999996 | |
- type: recall_at_100 | |
value: 79.06400000000001 | |
- type: recall_at_1000 | |
value: 93.59100000000001 | |
- type: recall_at_3 | |
value: 36.671 | |
- type: recall_at_5 | |
value: 43.22 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.366 | |
- type: map_at_10 | |
value: 30.214000000000002 | |
- type: map_at_100 | |
value: 31.647 | |
- type: map_at_1000 | |
value: 31.763 | |
- type: map_at_3 | |
value: 27.234 | |
- type: map_at_5 | |
value: 28.801 | |
- type: mrr_at_1 | |
value: 26.256 | |
- type: mrr_at_10 | |
value: 35.299 | |
- type: mrr_at_100 | |
value: 36.284 | |
- type: mrr_at_1000 | |
value: 36.342 | |
- type: mrr_at_3 | |
value: 32.572 | |
- type: mrr_at_5 | |
value: 34.050999999999995 | |
- type: ndcg_at_1 | |
value: 26.256 | |
- type: ndcg_at_10 | |
value: 35.899 | |
- type: ndcg_at_100 | |
value: 41.983 | |
- type: ndcg_at_1000 | |
value: 44.481 | |
- type: ndcg_at_3 | |
value: 30.665 | |
- type: ndcg_at_5 | |
value: 32.879999999999995 | |
- type: precision_at_1 | |
value: 26.256 | |
- type: precision_at_10 | |
value: 6.804 | |
- type: precision_at_100 | |
value: 1.187 | |
- type: precision_at_1000 | |
value: 0.16 | |
- type: precision_at_3 | |
value: 14.84 | |
- type: precision_at_5 | |
value: 10.708 | |
- type: recall_at_1 | |
value: 21.366 | |
- type: recall_at_10 | |
value: 47.878 | |
- type: recall_at_100 | |
value: 73.245 | |
- type: recall_at_1000 | |
value: 90.623 | |
- type: recall_at_3 | |
value: 33.341 | |
- type: recall_at_5 | |
value: 39.198 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.477166666666665 | |
- type: map_at_10 | |
value: 27.431416666666664 | |
- type: map_at_100 | |
value: 28.656000000000002 | |
- type: map_at_1000 | |
value: 28.787583333333338 | |
- type: map_at_3 | |
value: 24.85175 | |
- type: map_at_5 | |
value: 26.270166666666668 | |
- type: mrr_at_1 | |
value: 24.06841666666667 | |
- type: mrr_at_10 | |
value: 31.620000000000005 | |
- type: mrr_at_100 | |
value: 32.52283333333333 | |
- type: mrr_at_1000 | |
value: 32.59441666666667 | |
- type: mrr_at_3 | |
value: 29.328666666666663 | |
- type: mrr_at_5 | |
value: 30.620416666666667 | |
- type: ndcg_at_1 | |
value: 24.06841666666667 | |
- type: ndcg_at_10 | |
value: 32.404583333333335 | |
- type: ndcg_at_100 | |
value: 37.779500000000006 | |
- type: ndcg_at_1000 | |
value: 40.511583333333334 | |
- type: ndcg_at_3 | |
value: 27.994166666666665 | |
- type: ndcg_at_5 | |
value: 30.021749999999997 | |
- type: precision_at_1 | |
value: 24.06841666666667 | |
- type: precision_at_10 | |
value: 6.03725 | |
- type: precision_at_100 | |
value: 1.0500833333333337 | |
- type: precision_at_1000 | |
value: 0.14875000000000002 | |
- type: precision_at_3 | |
value: 13.419583333333335 | |
- type: precision_at_5 | |
value: 9.700666666666665 | |
- type: recall_at_1 | |
value: 19.477166666666665 | |
- type: recall_at_10 | |
value: 42.99441666666667 | |
- type: recall_at_100 | |
value: 66.787 | |
- type: recall_at_1000 | |
value: 86.18825000000001 | |
- type: recall_at_3 | |
value: 30.46366666666667 | |
- type: recall_at_5 | |
value: 35.83141666666667 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 16.246 | |
- type: map_at_10 | |
value: 22.127 | |
- type: map_at_100 | |
value: 23.006 | |
- type: map_at_1000 | |
value: 23.125 | |
- type: map_at_3 | |
value: 20.308999999999997 | |
- type: map_at_5 | |
value: 21.139 | |
- type: mrr_at_1 | |
value: 19.631999999999998 | |
- type: mrr_at_10 | |
value: 24.884999999999998 | |
- type: mrr_at_100 | |
value: 25.704 | |
- type: mrr_at_1000 | |
value: 25.793 | |
- type: mrr_at_3 | |
value: 23.083000000000002 | |
- type: mrr_at_5 | |
value: 23.942 | |
- type: ndcg_at_1 | |
value: 19.631999999999998 | |
- type: ndcg_at_10 | |
value: 25.862000000000002 | |
- type: ndcg_at_100 | |
value: 30.436000000000003 | |
- type: ndcg_at_1000 | |
value: 33.638 | |
- type: ndcg_at_3 | |
value: 22.431 | |
- type: ndcg_at_5 | |
value: 23.677 | |
- type: precision_at_1 | |
value: 19.631999999999998 | |
- type: precision_at_10 | |
value: 4.417 | |
- type: precision_at_100 | |
value: 0.7270000000000001 | |
- type: precision_at_1000 | |
value: 0.109 | |
- type: precision_at_3 | |
value: 10.327 | |
- type: precision_at_5 | |
value: 7.147 | |
- type: recall_at_1 | |
value: 16.246 | |
- type: recall_at_10 | |
value: 34.869 | |
- type: recall_at_100 | |
value: 56.221 | |
- type: recall_at_1000 | |
value: 80.449 | |
- type: recall_at_3 | |
value: 24.83 | |
- type: recall_at_5 | |
value: 28.142 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 9.798 | |
- type: map_at_10 | |
value: 14.695 | |
- type: map_at_100 | |
value: 15.590000000000002 | |
- type: map_at_1000 | |
value: 15.726999999999999 | |
- type: map_at_3 | |
value: 13.004999999999999 | |
- type: map_at_5 | |
value: 13.861 | |
- type: mrr_at_1 | |
value: 12.939 | |
- type: mrr_at_10 | |
value: 18.218 | |
- type: mrr_at_100 | |
value: 18.998 | |
- type: mrr_at_1000 | |
value: 19.093 | |
- type: mrr_at_3 | |
value: 16.454 | |
- type: mrr_at_5 | |
value: 17.354 | |
- type: ndcg_at_1 | |
value: 12.939 | |
- type: ndcg_at_10 | |
value: 18.278 | |
- type: ndcg_at_100 | |
value: 22.709 | |
- type: ndcg_at_1000 | |
value: 26.064 | |
- type: ndcg_at_3 | |
value: 15.204 | |
- type: ndcg_at_5 | |
value: 16.416 | |
- type: precision_at_1 | |
value: 12.939 | |
- type: precision_at_10 | |
value: 3.768 | |
- type: precision_at_100 | |
value: 0.724 | |
- type: precision_at_1000 | |
value: 0.11800000000000001 | |
- type: precision_at_3 | |
value: 7.707999999999999 | |
- type: precision_at_5 | |
value: 5.733 | |
- type: recall_at_1 | |
value: 9.798 | |
- type: recall_at_10 | |
value: 25.562 | |
- type: recall_at_100 | |
value: 45.678999999999995 | |
- type: recall_at_1000 | |
value: 69.963 | |
- type: recall_at_3 | |
value: 16.705000000000002 | |
- type: recall_at_5 | |
value: 19.969 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.1 | |
- type: map_at_10 | |
value: 27.034999999999997 | |
- type: map_at_100 | |
value: 28.396 | |
- type: map_at_1000 | |
value: 28.518 | |
- type: map_at_3 | |
value: 24.363 | |
- type: map_at_5 | |
value: 25.826999999999998 | |
- type: mrr_at_1 | |
value: 23.694000000000003 | |
- type: mrr_at_10 | |
value: 31.724999999999998 | |
- type: mrr_at_100 | |
value: 32.743 | |
- type: mrr_at_1000 | |
value: 32.82 | |
- type: mrr_at_3 | |
value: 29.275000000000002 | |
- type: mrr_at_5 | |
value: 30.684 | |
- type: ndcg_at_1 | |
value: 23.694000000000003 | |
- type: ndcg_at_10 | |
value: 32.366 | |
- type: ndcg_at_100 | |
value: 38.241 | |
- type: ndcg_at_1000 | |
value: 40.973 | |
- type: ndcg_at_3 | |
value: 27.661 | |
- type: ndcg_at_5 | |
value: 29.782999999999998 | |
- type: precision_at_1 | |
value: 23.694000000000003 | |
- type: precision_at_10 | |
value: 5.951 | |
- type: precision_at_100 | |
value: 1.0070000000000001 | |
- type: precision_at_1000 | |
value: 0.135 | |
- type: precision_at_3 | |
value: 13.34 | |
- type: precision_at_5 | |
value: 9.533999999999999 | |
- type: recall_at_1 | |
value: 19.1 | |
- type: recall_at_10 | |
value: 44.032 | |
- type: recall_at_100 | |
value: 69.186 | |
- type: recall_at_1000 | |
value: 88.562 | |
- type: recall_at_3 | |
value: 30.712 | |
- type: recall_at_5 | |
value: 36.372 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.671 | |
- type: map_at_10 | |
value: 28.583 | |
- type: map_at_100 | |
value: 30.098999999999997 | |
- type: map_at_1000 | |
value: 30.364 | |
- type: map_at_3 | |
value: 25.825 | |
- type: map_at_5 | |
value: 27.500999999999998 | |
- type: mrr_at_1 | |
value: 25.889 | |
- type: mrr_at_10 | |
value: 33.617999999999995 | |
- type: mrr_at_100 | |
value: 34.687 | |
- type: mrr_at_1000 | |
value: 34.774 | |
- type: mrr_at_3 | |
value: 31.191999999999997 | |
- type: mrr_at_5 | |
value: 32.675 | |
- type: ndcg_at_1 | |
value: 25.889 | |
- type: ndcg_at_10 | |
value: 34.056999999999995 | |
- type: ndcg_at_100 | |
value: 40.142 | |
- type: ndcg_at_1000 | |
value: 43.614000000000004 | |
- type: ndcg_at_3 | |
value: 29.688 | |
- type: ndcg_at_5 | |
value: 32.057 | |
- type: precision_at_1 | |
value: 25.889 | |
- type: precision_at_10 | |
value: 6.7 | |
- type: precision_at_100 | |
value: 1.417 | |
- type: precision_at_1000 | |
value: 0.241 | |
- type: precision_at_3 | |
value: 14.360999999999999 | |
- type: precision_at_5 | |
value: 10.711 | |
- type: recall_at_1 | |
value: 20.671 | |
- type: recall_at_10 | |
value: 43.97 | |
- type: recall_at_100 | |
value: 71.83699999999999 | |
- type: recall_at_1000 | |
value: 94.42399999999999 | |
- type: recall_at_3 | |
value: 31.0 | |
- type: recall_at_5 | |
value: 37.489 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 13.66 | |
- type: map_at_10 | |
value: 18.798000000000002 | |
- type: map_at_100 | |
value: 19.75 | |
- type: map_at_1000 | |
value: 19.851 | |
- type: map_at_3 | |
value: 16.874 | |
- type: map_at_5 | |
value: 18.136 | |
- type: mrr_at_1 | |
value: 14.972 | |
- type: mrr_at_10 | |
value: 20.565 | |
- type: mrr_at_100 | |
value: 21.488 | |
- type: mrr_at_1000 | |
value: 21.567 | |
- type: mrr_at_3 | |
value: 18.669 | |
- type: mrr_at_5 | |
value: 19.861 | |
- type: ndcg_at_1 | |
value: 14.972 | |
- type: ndcg_at_10 | |
value: 22.128999999999998 | |
- type: ndcg_at_100 | |
value: 27.028000000000002 | |
- type: ndcg_at_1000 | |
value: 29.887000000000004 | |
- type: ndcg_at_3 | |
value: 18.365000000000002 | |
- type: ndcg_at_5 | |
value: 20.48 | |
- type: precision_at_1 | |
value: 14.972 | |
- type: precision_at_10 | |
value: 3.549 | |
- type: precision_at_100 | |
value: 0.632 | |
- type: precision_at_1000 | |
value: 0.093 | |
- type: precision_at_3 | |
value: 7.887 | |
- type: precision_at_5 | |
value: 5.840999999999999 | |
- type: recall_at_1 | |
value: 13.66 | |
- type: recall_at_10 | |
value: 30.801000000000002 | |
- type: recall_at_100 | |
value: 53.626 | |
- type: recall_at_1000 | |
value: 75.634 | |
- type: recall_at_3 | |
value: 20.807000000000002 | |
- type: recall_at_5 | |
value: 25.86 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.622 | |
- type: map_at_10 | |
value: 16.042 | |
- type: map_at_100 | |
value: 18.023 | |
- type: map_at_1000 | |
value: 18.228 | |
- type: map_at_3 | |
value: 12.995999999999999 | |
- type: map_at_5 | |
value: 14.424000000000001 | |
- type: mrr_at_1 | |
value: 18.892999999999997 | |
- type: mrr_at_10 | |
value: 30.575000000000003 | |
- type: mrr_at_100 | |
value: 31.814999999999998 | |
- type: mrr_at_1000 | |
value: 31.856 | |
- type: mrr_at_3 | |
value: 26.851000000000003 | |
- type: mrr_at_5 | |
value: 29.021 | |
- type: ndcg_at_1 | |
value: 18.892999999999997 | |
- type: ndcg_at_10 | |
value: 23.575 | |
- type: ndcg_at_100 | |
value: 31.713 | |
- type: ndcg_at_1000 | |
value: 35.465 | |
- type: ndcg_at_3 | |
value: 18.167 | |
- type: ndcg_at_5 | |
value: 20.071 | |
- type: precision_at_1 | |
value: 18.892999999999997 | |
- type: precision_at_10 | |
value: 7.883 | |
- type: precision_at_100 | |
value: 1.652 | |
- type: precision_at_1000 | |
value: 0.23500000000000001 | |
- type: precision_at_3 | |
value: 13.898 | |
- type: precision_at_5 | |
value: 11.14 | |
- type: recall_at_1 | |
value: 8.622 | |
- type: recall_at_10 | |
value: 30.044999999999998 | |
- type: recall_at_100 | |
value: 58.072 | |
- type: recall_at_1000 | |
value: 79.226 | |
- type: recall_at_3 | |
value: 17.21 | |
- type: recall_at_5 | |
value: 22.249 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 4.845 | |
- type: map_at_10 | |
value: 12.352 | |
- type: map_at_100 | |
value: 17.423 | |
- type: map_at_1000 | |
value: 18.529 | |
- type: map_at_3 | |
value: 8.505 | |
- type: map_at_5 | |
value: 10.213 | |
- type: mrr_at_1 | |
value: 41.75 | |
- type: mrr_at_10 | |
value: 54.6 | |
- type: mrr_at_100 | |
value: 55.345 | |
- type: mrr_at_1000 | |
value: 55.374 | |
- type: mrr_at_3 | |
value: 52.37500000000001 | |
- type: mrr_at_5 | |
value: 53.87499999999999 | |
- type: ndcg_at_1 | |
value: 31.25 | |
- type: ndcg_at_10 | |
value: 26.779999999999998 | |
- type: ndcg_at_100 | |
value: 31.929000000000002 | |
- type: ndcg_at_1000 | |
value: 39.290000000000006 | |
- type: ndcg_at_3 | |
value: 28.746 | |
- type: ndcg_at_5 | |
value: 27.334999999999997 | |
- type: precision_at_1 | |
value: 41.75 | |
- type: precision_at_10 | |
value: 22.55 | |
- type: precision_at_100 | |
value: 7.242 | |
- type: precision_at_1000 | |
value: 1.439 | |
- type: precision_at_3 | |
value: 33.833 | |
- type: precision_at_5 | |
value: 28.65 | |
- type: recall_at_1 | |
value: 4.845 | |
- type: recall_at_10 | |
value: 18.664 | |
- type: recall_at_100 | |
value: 41.085 | |
- type: recall_at_1000 | |
value: 65.242 | |
- type: recall_at_3 | |
value: 10.572 | |
- type: recall_at_5 | |
value: 13.961000000000002 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 47.08 | |
- type: f1 | |
value: 42.843345856303756 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 33.743 | |
- type: map_at_10 | |
value: 46.521 | |
- type: map_at_100 | |
value: 47.235 | |
- type: map_at_1000 | |
value: 47.272 | |
- type: map_at_3 | |
value: 43.252 | |
- type: map_at_5 | |
value: 45.267 | |
- type: mrr_at_1 | |
value: 36.484 | |
- type: mrr_at_10 | |
value: 49.406 | |
- type: mrr_at_100 | |
value: 50.03300000000001 | |
- type: mrr_at_1000 | |
value: 50.058 | |
- type: mrr_at_3 | |
value: 46.195 | |
- type: mrr_at_5 | |
value: 48.193999999999996 | |
- type: ndcg_at_1 | |
value: 36.484 | |
- type: ndcg_at_10 | |
value: 53.42 | |
- type: ndcg_at_100 | |
value: 56.69499999999999 | |
- type: ndcg_at_1000 | |
value: 57.623999999999995 | |
- type: ndcg_at_3 | |
value: 47.010999999999996 | |
- type: ndcg_at_5 | |
value: 50.524 | |
- type: precision_at_1 | |
value: 36.484 | |
- type: precision_at_10 | |
value: 7.925 | |
- type: precision_at_100 | |
value: 0.975 | |
- type: precision_at_1000 | |
value: 0.107 | |
- type: precision_at_3 | |
value: 19.967 | |
- type: precision_at_5 | |
value: 13.87 | |
- type: recall_at_1 | |
value: 33.743 | |
- type: recall_at_10 | |
value: 71.988 | |
- type: recall_at_100 | |
value: 86.60799999999999 | |
- type: recall_at_1000 | |
value: 93.54 | |
- type: recall_at_3 | |
value: 54.855 | |
- type: recall_at_5 | |
value: 63.341 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 13.003 | |
- type: map_at_10 | |
value: 21.766 | |
- type: map_at_100 | |
value: 23.618 | |
- type: map_at_1000 | |
value: 23.832 | |
- type: map_at_3 | |
value: 18.282999999999998 | |
- type: map_at_5 | |
value: 20.267 | |
- type: mrr_at_1 | |
value: 26.851999999999997 | |
- type: mrr_at_10 | |
value: 34.658 | |
- type: mrr_at_100 | |
value: 35.729 | |
- type: mrr_at_1000 | |
value: 35.785 | |
- type: mrr_at_3 | |
value: 31.686999999999998 | |
- type: mrr_at_5 | |
value: 33.315 | |
- type: ndcg_at_1 | |
value: 26.851999999999997 | |
- type: ndcg_at_10 | |
value: 28.563 | |
- type: ndcg_at_100 | |
value: 36.374 | |
- type: ndcg_at_1000 | |
value: 40.306999999999995 | |
- type: ndcg_at_3 | |
value: 24.224 | |
- type: ndcg_at_5 | |
value: 25.939 | |
- type: precision_at_1 | |
value: 26.851999999999997 | |
- type: precision_at_10 | |
value: 8.193999999999999 | |
- type: precision_at_100 | |
value: 1.616 | |
- type: precision_at_1000 | |
value: 0.232 | |
- type: precision_at_3 | |
value: 16.255 | |
- type: precision_at_5 | |
value: 12.469 | |
- type: recall_at_1 | |
value: 13.003 | |
- type: recall_at_10 | |
value: 35.689 | |
- type: recall_at_100 | |
value: 65.762 | |
- type: recall_at_1000 | |
value: 89.546 | |
- type: recall_at_3 | |
value: 21.820999999999998 | |
- type: recall_at_5 | |
value: 28.097 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 29.541 | |
- type: map_at_10 | |
value: 43.088 | |
- type: map_at_100 | |
value: 44.252 | |
- type: map_at_1000 | |
value: 44.345 | |
- type: map_at_3 | |
value: 39.79 | |
- type: map_at_5 | |
value: 41.687000000000005 | |
- type: mrr_at_1 | |
value: 59.082 | |
- type: mrr_at_10 | |
value: 67.27300000000001 | |
- type: mrr_at_100 | |
value: 67.708 | |
- type: mrr_at_1000 | |
value: 67.731 | |
- type: mrr_at_3 | |
value: 65.526 | |
- type: mrr_at_5 | |
value: 66.589 | |
- type: ndcg_at_1 | |
value: 59.082 | |
- type: ndcg_at_10 | |
value: 52.372 | |
- type: ndcg_at_100 | |
value: 56.725 | |
- type: ndcg_at_1000 | |
value: 58.665 | |
- type: ndcg_at_3 | |
value: 47.129 | |
- type: ndcg_at_5 | |
value: 49.808 | |
- type: precision_at_1 | |
value: 59.082 | |
- type: precision_at_10 | |
value: 11.275 | |
- type: precision_at_100 | |
value: 1.469 | |
- type: precision_at_1000 | |
value: 0.173 | |
- type: precision_at_3 | |
value: 29.773 | |
- type: precision_at_5 | |
value: 19.980999999999998 | |
- type: recall_at_1 | |
value: 29.541 | |
- type: recall_at_10 | |
value: 56.374 | |
- type: recall_at_100 | |
value: 73.42999999999999 | |
- type: recall_at_1000 | |
value: 86.28 | |
- type: recall_at_3 | |
value: 44.659 | |
- type: recall_at_5 | |
value: 49.952999999999996 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 75.1904 | |
- type: ap | |
value: 69.80555086826531 | |
- type: f1 | |
value: 74.93725389065787 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.085 | |
- type: map_at_10 | |
value: 13.344000000000001 | |
- type: map_at_100 | |
value: 14.501 | |
- type: map_at_1000 | |
value: 14.605 | |
- type: map_at_3 | |
value: 10.758 | |
- type: map_at_5 | |
value: 12.162 | |
- type: mrr_at_1 | |
value: 7.278 | |
- type: mrr_at_10 | |
value: 13.607 | |
- type: mrr_at_100 | |
value: 14.761 | |
- type: mrr_at_1000 | |
value: 14.860000000000001 | |
- type: mrr_at_3 | |
value: 11.003 | |
- type: mrr_at_5 | |
value: 12.421 | |
- type: ndcg_at_1 | |
value: 7.278 | |
- type: ndcg_at_10 | |
value: 17.473 | |
- type: ndcg_at_100 | |
value: 23.721 | |
- type: ndcg_at_1000 | |
value: 26.69 | |
- type: ndcg_at_3 | |
value: 12.078 | |
- type: ndcg_at_5 | |
value: 14.62 | |
- type: precision_at_1 | |
value: 7.278 | |
- type: precision_at_10 | |
value: 3.175 | |
- type: precision_at_100 | |
value: 0.639 | |
- type: precision_at_1000 | |
value: 0.09 | |
- type: precision_at_3 | |
value: 5.382 | |
- type: precision_at_5 | |
value: 4.519 | |
- type: recall_at_1 | |
value: 7.085 | |
- type: recall_at_10 | |
value: 30.549 | |
- type: recall_at_100 | |
value: 60.919999999999995 | |
- type: recall_at_1000 | |
value: 84.372 | |
- type: recall_at_3 | |
value: 15.675 | |
- type: recall_at_5 | |
value: 21.818 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 94.46876424988601 | |
- type: f1 | |
value: 94.23159241922738 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 81.0875512995896 | |
- type: f1 | |
value: 61.674961674414 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 75.01344989912575 | |
- type: f1 | |
value: 71.7942527839921 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 79.15601882985877 | |
- type: f1 | |
value: 78.82502954601195 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 31.468806971345227 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 27.874332804382256 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 30.099340785595842 | |
- type: mrr | |
value: 31.077367694660257 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 3.9050000000000002 | |
- type: map_at_10 | |
value: 8.931000000000001 | |
- type: map_at_100 | |
value: 11.246 | |
- type: map_at_1000 | |
value: 12.579 | |
- type: map_at_3 | |
value: 6.544 | |
- type: map_at_5 | |
value: 7.854 | |
- type: mrr_at_1 | |
value: 33.745999999999995 | |
- type: mrr_at_10 | |
value: 44.734 | |
- type: mrr_at_100 | |
value: 45.486 | |
- type: mrr_at_1000 | |
value: 45.534 | |
- type: mrr_at_3 | |
value: 42.157 | |
- type: mrr_at_5 | |
value: 43.813 | |
- type: ndcg_at_1 | |
value: 31.734 | |
- type: ndcg_at_10 | |
value: 26.284999999999997 | |
- type: ndcg_at_100 | |
value: 25.211 | |
- type: ndcg_at_1000 | |
value: 34.974 | |
- type: ndcg_at_3 | |
value: 29.918 | |
- type: ndcg_at_5 | |
value: 29.066 | |
- type: precision_at_1 | |
value: 33.745999999999995 | |
- type: precision_at_10 | |
value: 19.628 | |
- type: precision_at_100 | |
value: 6.476999999999999 | |
- type: precision_at_1000 | |
value: 1.976 | |
- type: precision_at_3 | |
value: 28.793000000000003 | |
- type: precision_at_5 | |
value: 25.759 | |
- type: recall_at_1 | |
value: 3.9050000000000002 | |
- type: recall_at_10 | |
value: 13.375 | |
- type: recall_at_100 | |
value: 28.453 | |
- type: recall_at_1000 | |
value: 61.67399999999999 | |
- type: recall_at_3 | |
value: 7.774 | |
- type: recall_at_5 | |
value: 10.754 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 18.33 | |
- type: map_at_10 | |
value: 30.44 | |
- type: map_at_100 | |
value: 31.848 | |
- type: map_at_1000 | |
value: 31.906000000000002 | |
- type: map_at_3 | |
value: 26.143 | |
- type: map_at_5 | |
value: 28.583 | |
- type: mrr_at_1 | |
value: 21.031 | |
- type: mrr_at_10 | |
value: 33.028 | |
- type: mrr_at_100 | |
value: 34.166000000000004 | |
- type: mrr_at_1000 | |
value: 34.208 | |
- type: mrr_at_3 | |
value: 29.089 | |
- type: mrr_at_5 | |
value: 31.362000000000002 | |
- type: ndcg_at_1 | |
value: 21.031 | |
- type: ndcg_at_10 | |
value: 37.65 | |
- type: ndcg_at_100 | |
value: 43.945 | |
- type: ndcg_at_1000 | |
value: 45.338 | |
- type: ndcg_at_3 | |
value: 29.256999999999998 | |
- type: ndcg_at_5 | |
value: 33.453 | |
- type: precision_at_1 | |
value: 21.031 | |
- type: precision_at_10 | |
value: 6.8309999999999995 | |
- type: precision_at_100 | |
value: 1.035 | |
- type: precision_at_1000 | |
value: 0.117 | |
- type: precision_at_3 | |
value: 13.818 | |
- type: precision_at_5 | |
value: 10.649000000000001 | |
- type: recall_at_1 | |
value: 18.33 | |
- type: recall_at_10 | |
value: 57.330999999999996 | |
- type: recall_at_100 | |
value: 85.284 | |
- type: recall_at_1000 | |
value: 95.676 | |
- type: recall_at_3 | |
value: 35.356 | |
- type: recall_at_5 | |
value: 45.073 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 66.373 | |
- type: map_at_10 | |
value: 80.233 | |
- type: map_at_100 | |
value: 80.973 | |
- type: map_at_1000 | |
value: 80.99499999999999 | |
- type: map_at_3 | |
value: 77.127 | |
- type: map_at_5 | |
value: 79.056 | |
- type: mrr_at_1 | |
value: 76.55 | |
- type: mrr_at_10 | |
value: 83.813 | |
- type: mrr_at_100 | |
value: 83.96900000000001 | |
- type: mrr_at_1000 | |
value: 83.97200000000001 | |
- type: mrr_at_3 | |
value: 82.547 | |
- type: mrr_at_5 | |
value: 83.38600000000001 | |
- type: ndcg_at_1 | |
value: 76.53999999999999 | |
- type: ndcg_at_10 | |
value: 84.638 | |
- type: ndcg_at_100 | |
value: 86.28099999999999 | |
- type: ndcg_at_1000 | |
value: 86.459 | |
- type: ndcg_at_3 | |
value: 81.19 | |
- type: ndcg_at_5 | |
value: 83.057 | |
- type: precision_at_1 | |
value: 76.53999999999999 | |
- type: precision_at_10 | |
value: 12.928999999999998 | |
- type: precision_at_100 | |
value: 1.514 | |
- type: precision_at_1000 | |
value: 0.156 | |
- type: precision_at_3 | |
value: 35.503 | |
- type: precision_at_5 | |
value: 23.512 | |
- type: recall_at_1 | |
value: 66.373 | |
- type: recall_at_10 | |
value: 93.273 | |
- type: recall_at_100 | |
value: 99.031 | |
- type: recall_at_1000 | |
value: 99.91799999999999 | |
- type: recall_at_3 | |
value: 83.55799999999999 | |
- type: recall_at_5 | |
value: 88.644 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 43.67174666339103 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 61.66838659211271 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 2.318 | |
- type: map_at_10 | |
value: 5.938000000000001 | |
- type: map_at_100 | |
value: 7.582 | |
- type: map_at_1000 | |
value: 7.936 | |
- type: map_at_3 | |
value: 4.208 | |
- type: map_at_5 | |
value: 5.098 | |
- type: mrr_at_1 | |
value: 11.4 | |
- type: mrr_at_10 | |
value: 17.655 | |
- type: mrr_at_100 | |
value: 19.088 | |
- type: mrr_at_1000 | |
value: 19.203 | |
- type: mrr_at_3 | |
value: 15.25 | |
- type: mrr_at_5 | |
value: 16.535 | |
- type: ndcg_at_1 | |
value: 11.4 | |
- type: ndcg_at_10 | |
value: 10.388 | |
- type: ndcg_at_100 | |
value: 18.165 | |
- type: ndcg_at_1000 | |
value: 24.842 | |
- type: ndcg_at_3 | |
value: 9.414 | |
- type: ndcg_at_5 | |
value: 8.453 | |
- type: precision_at_1 | |
value: 11.4 | |
- type: precision_at_10 | |
value: 5.54 | |
- type: precision_at_100 | |
value: 1.71 | |
- type: precision_at_1000 | |
value: 0.33 | |
- type: precision_at_3 | |
value: 8.866999999999999 | |
- type: precision_at_5 | |
value: 7.580000000000001 | |
- type: recall_at_1 | |
value: 2.318 | |
- type: recall_at_10 | |
value: 11.267000000000001 | |
- type: recall_at_100 | |
value: 34.743 | |
- type: recall_at_1000 | |
value: 67.07300000000001 | |
- type: recall_at_3 | |
value: 5.408 | |
- type: recall_at_5 | |
value: 7.713 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 72.15850185456762 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 61.59518395985063 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 79.71131323749228 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 72.10974664733891 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.17899407125657 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 79.41138579273438 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 85.44343473477939 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 63.90264271389905 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 77.44151296326804 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 76.27597486396654 | |
- type: mrr | |
value: 93.28127119793788 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 49.594 | |
- type: map_at_10 | |
value: 60.951 | |
- type: map_at_100 | |
value: 61.68599999999999 | |
- type: map_at_1000 | |
value: 61.712 | |
- type: map_at_3 | |
value: 57.946 | |
- type: map_at_5 | |
value: 59.89 | |
- type: mrr_at_1 | |
value: 52.666999999999994 | |
- type: mrr_at_10 | |
value: 62.724000000000004 | |
- type: mrr_at_100 | |
value: 63.269 | |
- type: mrr_at_1000 | |
value: 63.291 | |
- type: mrr_at_3 | |
value: 60.167 | |
- type: mrr_at_5 | |
value: 61.95 | |
- type: ndcg_at_1 | |
value: 52.666999999999994 | |
- type: ndcg_at_10 | |
value: 66.35600000000001 | |
- type: ndcg_at_100 | |
value: 69.463 | |
- type: ndcg_at_1000 | |
value: 70.111 | |
- type: ndcg_at_3 | |
value: 60.901 | |
- type: ndcg_at_5 | |
value: 64.054 | |
- type: precision_at_1 | |
value: 52.666999999999994 | |
- type: precision_at_10 | |
value: 9.0 | |
- type: precision_at_100 | |
value: 1.073 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 24.221999999999998 | |
- type: precision_at_5 | |
value: 16.333000000000002 | |
- type: recall_at_1 | |
value: 49.594 | |
- type: recall_at_10 | |
value: 81.256 | |
- type: recall_at_100 | |
value: 94.989 | |
- type: recall_at_1000 | |
value: 100.0 | |
- type: recall_at_3 | |
value: 66.706 | |
- type: recall_at_5 | |
value: 74.411 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.65049504950495 | |
- type: cos_sim_ap | |
value: 88.1421623503371 | |
- type: cos_sim_f1 | |
value: 81.44072036018008 | |
- type: cos_sim_precision | |
value: 81.48148148148148 | |
- type: cos_sim_recall | |
value: 81.39999999999999 | |
- type: dot_accuracy | |
value: 99.37623762376238 | |
- type: dot_ap | |
value: 69.87152032240303 | |
- type: dot_f1 | |
value: 65.64885496183206 | |
- type: dot_precision | |
value: 72.18225419664267 | |
- type: dot_recall | |
value: 60.199999999999996 | |
- type: euclidean_accuracy | |
value: 99.63069306930693 | |
- type: euclidean_ap | |
value: 86.13858297902517 | |
- type: euclidean_f1 | |
value: 79.87679671457904 | |
- type: euclidean_precision | |
value: 82.0675105485232 | |
- type: euclidean_recall | |
value: 77.8 | |
- type: manhattan_accuracy | |
value: 99.63168316831683 | |
- type: manhattan_ap | |
value: 86.31976532265482 | |
- type: manhattan_f1 | |
value: 80.10204081632654 | |
- type: manhattan_precision | |
value: 81.77083333333334 | |
- type: manhattan_recall | |
value: 78.5 | |
- type: max_accuracy | |
value: 99.65049504950495 | |
- type: max_ap | |
value: 88.1421623503371 | |
- type: max_f1 | |
value: 81.44072036018008 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 68.19604139959692 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 36.3569584557381 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 48.82174503355024 | |
- type: mrr | |
value: 49.610933388506915 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 30.805895993742798 | |
- type: cos_sim_spearman | |
value: 31.445431226826738 | |
- type: dot_pearson | |
value: 24.441585432516867 | |
- type: dot_spearman | |
value: 25.468117334810188 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.2 | |
- type: map_at_10 | |
value: 1.431 | |
- type: map_at_100 | |
value: 7.138999999999999 | |
- type: map_at_1000 | |
value: 17.933 | |
- type: map_at_3 | |
value: 0.551 | |
- type: map_at_5 | |
value: 0.7979999999999999 | |
- type: mrr_at_1 | |
value: 76.0 | |
- type: mrr_at_10 | |
value: 85.167 | |
- type: mrr_at_100 | |
value: 85.21300000000001 | |
- type: mrr_at_1000 | |
value: 85.21300000000001 | |
- type: mrr_at_3 | |
value: 84.667 | |
- type: mrr_at_5 | |
value: 85.167 | |
- type: ndcg_at_1 | |
value: 72.0 | |
- type: ndcg_at_10 | |
value: 63.343 | |
- type: ndcg_at_100 | |
value: 45.739999999999995 | |
- type: ndcg_at_1000 | |
value: 41.875 | |
- type: ndcg_at_3 | |
value: 68.162 | |
- type: ndcg_at_5 | |
value: 65.666 | |
- type: precision_at_1 | |
value: 76.0 | |
- type: precision_at_10 | |
value: 66.4 | |
- type: precision_at_100 | |
value: 46.800000000000004 | |
- type: precision_at_1000 | |
value: 18.996 | |
- type: precision_at_3 | |
value: 72.667 | |
- type: precision_at_5 | |
value: 68.4 | |
- type: recall_at_1 | |
value: 0.2 | |
- type: recall_at_10 | |
value: 1.712 | |
- type: recall_at_100 | |
value: 10.896 | |
- type: recall_at_1000 | |
value: 40.115 | |
- type: recall_at_3 | |
value: 0.594 | |
- type: recall_at_5 | |
value: 0.889 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 1.0619999999999998 | |
- type: map_at_10 | |
value: 5.611 | |
- type: map_at_100 | |
value: 8.841000000000001 | |
- type: map_at_1000 | |
value: 10.154 | |
- type: map_at_3 | |
value: 2.7720000000000002 | |
- type: map_at_5 | |
value: 4.181 | |
- type: mrr_at_1 | |
value: 14.285999999999998 | |
- type: mrr_at_10 | |
value: 26.249 | |
- type: mrr_at_100 | |
value: 28.046 | |
- type: mrr_at_1000 | |
value: 28.083000000000002 | |
- type: mrr_at_3 | |
value: 21.769 | |
- type: mrr_at_5 | |
value: 24.524 | |
- type: ndcg_at_1 | |
value: 11.224 | |
- type: ndcg_at_10 | |
value: 12.817 | |
- type: ndcg_at_100 | |
value: 23.183999999999997 | |
- type: ndcg_at_1000 | |
value: 35.099000000000004 | |
- type: ndcg_at_3 | |
value: 11.215 | |
- type: ndcg_at_5 | |
value: 12.016 | |
- type: precision_at_1 | |
value: 14.285999999999998 | |
- type: precision_at_10 | |
value: 12.653 | |
- type: precision_at_100 | |
value: 5.306 | |
- type: precision_at_1000 | |
value: 1.294 | |
- type: precision_at_3 | |
value: 13.605 | |
- type: precision_at_5 | |
value: 13.877999999999998 | |
- type: recall_at_1 | |
value: 1.0619999999999998 | |
- type: recall_at_10 | |
value: 10.377 | |
- type: recall_at_100 | |
value: 34.77 | |
- type: recall_at_1000 | |
value: 70.875 | |
- type: recall_at_3 | |
value: 3.688 | |
- type: recall_at_5 | |
value: 6.2509999999999994 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 71.8488 | |
- type: ap | |
value: 15.590122317097372 | |
- type: f1 | |
value: 55.86108396102662 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 57.61460101867573 | |
- type: f1 | |
value: 57.8678726826158 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 32.01459876897588 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 84.1032365738809 | |
- type: cos_sim_ap | |
value: 66.60137415520323 | |
- type: cos_sim_f1 | |
value: 62.12845010615712 | |
- type: cos_sim_precision | |
value: 62.493326214628944 | |
- type: cos_sim_recall | |
value: 61.76781002638523 | |
- type: dot_accuracy | |
value: 81.85015199380103 | |
- type: dot_ap | |
value: 58.854644211365084 | |
- type: dot_f1 | |
value: 56.15180082185158 | |
- type: dot_precision | |
value: 51.806422836752894 | |
- type: dot_recall | |
value: 61.2928759894459 | |
- type: euclidean_accuracy | |
value: 83.6681170650295 | |
- type: euclidean_ap | |
value: 64.93555585305603 | |
- type: euclidean_f1 | |
value: 61.02775195857125 | |
- type: euclidean_precision | |
value: 61.42742582197273 | |
- type: euclidean_recall | |
value: 60.633245382585756 | |
- type: manhattan_accuracy | |
value: 83.73368301841808 | |
- type: manhattan_ap | |
value: 65.45422483039611 | |
- type: manhattan_f1 | |
value: 61.58552806597499 | |
- type: manhattan_precision | |
value: 62.09763948497854 | |
- type: manhattan_recall | |
value: 61.08179419525066 | |
- type: max_accuracy | |
value: 84.1032365738809 | |
- type: max_ap | |
value: 66.60137415520323 | |
- type: max_f1 | |
value: 62.12845010615712 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 86.36628245430201 | |
- type: cos_sim_ap | |
value: 79.29963896460292 | |
- type: cos_sim_f1 | |
value: 72.63895990066467 | |
- type: cos_sim_precision | |
value: 69.09128803668196 | |
- type: cos_sim_recall | |
value: 76.57068062827224 | |
- type: dot_accuracy | |
value: 84.65091007878294 | |
- type: dot_ap | |
value: 75.04883449222972 | |
- type: dot_f1 | |
value: 69.18569117382708 | |
- type: dot_precision | |
value: 64.89512376070682 | |
- type: dot_recall | |
value: 74.08376963350786 | |
- type: euclidean_accuracy | |
value: 85.88116583226608 | |
- type: euclidean_ap | |
value: 78.42687640324908 | |
- type: euclidean_f1 | |
value: 71.74350111107192 | |
- type: euclidean_precision | |
value: 66.19800820152314 | |
- type: euclidean_recall | |
value: 78.3030489682784 | |
- type: manhattan_accuracy | |
value: 86.27508052935926 | |
- type: manhattan_ap | |
value: 79.29581298930101 | |
- type: manhattan_f1 | |
value: 72.51838235294117 | |
- type: manhattan_precision | |
value: 67.03921568627452 | |
- type: manhattan_recall | |
value: 78.97289805974745 | |
- type: max_accuracy | |
value: 86.36628245430201 | |
- type: max_ap | |
value: 79.29963896460292 | |
- type: max_f1 | |
value: 72.63895990066467 | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading unsupervised SimCSE model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + SimCSE (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.6522, 0.1891], | |
[0.1162, 0.3457]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`[email protected]`) and Vaibhav (`[email protected]`). |