|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- feature-extraction |
|
license: mit |
|
language: |
|
- fr |
|
- en |
|
model-index: |
|
- name: Solon-embeddings-base-0.1 |
|
results: |
|
- task: |
|
type: sentence-similarity |
|
name: Passage Retrieval |
|
dataset: |
|
type: unicamp-dl/mmarco |
|
name: mMARCO-fr |
|
config: french |
|
split: validation |
|
metrics: |
|
- type: recall_at_500 |
|
name: Recall@500 |
|
value: 90.9 |
|
- type: recall_at_100 |
|
name: Recall@100 |
|
value: 80.6 |
|
- type: recall_at_10 |
|
name: Recall@10 |
|
value: 52.5 |
|
- type: map_at_10 |
|
name: MAP@10 |
|
value: 27.4 |
|
- type: ndcg_at_10 |
|
name: nDCG@10 |
|
value: 33.5 |
|
- type: mrr_at_10 |
|
name: MRR@10 |
|
value: 27.9 |
|
--- |
|
|
|
# Solon Embeddings — Base 0.1 |
|
SOTA Open source french embedding model. |
|
|
|
**Instructions :** |
|
Add "query : " before the *query* to retrieve to increase performance of retrieval. |
|
No instructions needed for *passages*. |
|
|
|
|
|
| Model | Mean Score | |
|
| --- | --- | |
|
| **OrdalieTech/Solon-embeddings-large-0.1** | 0.7490 | |
|
| cohere/embed-multilingual-v3 | 0.7402 | |
|
| **OrdalieTech/Solon-embeddings-base-0.1** | 0.7306 | |
|
| openai/ada-002 | 0.7290 | |
|
| cohere/embed-multilingual-light-v3 | 0.6945 | |
|
| antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 | |
|
| dangvantuan/sentence-camembert-large | 0.6756 | |
|
| voyage/voyage-01 | 0.6753 | |
|
| intfloat/multilingual-e5-large | 0.6660 | |
|
| intfloat/multilingual-e5-base | 0.6597 | |
|
| Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 | |
|
| dangvantuan/sentence-camembert-base | 0.5456 | |
|
| EuropeanParliament/eubert_embedding_v1 | 0.5063 | |
|
|
|
These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) : |
|
- AmazonReviewsClassification (MTEB) |
|
- MassiveIntentClassification (MTEB) |
|
- MassiveScenarioClassification (MTEB) |
|
- MTOPDomainClassification (MTEB) |
|
- MTOPIntentClassification (MTEB) |
|
- STS22 (MTEB) |
|
- MiraclFRRerank (Miracl) |
|
- OrdalieFRSTS (Ordalie) |
|
- OrdalieFRReranking (Ordalie) |
|
|
|
We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments. |
|
|
|
(evaluation script available here : github.com/OrdalieTech/mteb) |