netapy's picture
Adding eval results on mMARCO-fr (#1)
d9cfe58 verified
metadata
pipeline_tag: sentence-similarity
tags:
  - feature-extraction
license: mit
language:
  - fr
  - en
model-index:
  - name: Solon-embeddings-base-0.1
    results:
      - task:
          type: sentence-similarity
          name: Passage Retrieval
        dataset:
          type: unicamp-dl/mmarco
          name: mMARCO-fr
          config: french
          split: validation
        metrics:
          - type: recall_at_500
            name: Recall@500
            value: 90.9
          - type: recall_at_100
            name: Recall@100
            value: 80.6
          - type: recall_at_10
            name: Recall@10
            value: 52.5
          - type: map_at_10
            name: MAP@10
            value: 27.4
          - type: ndcg_at_10
            name: nDCG@10
            value: 33.5
          - type: mrr_at_10
            name: MRR@10
            value: 27.9

Solon Embeddings — Base 0.1

SOTA Open source french embedding model.

Instructions :
Add "query : " before the query to retrieve to increase performance of retrieval.
No instructions needed for passages.

Model Mean Score
OrdalieTech/Solon-embeddings-large-0.1 0.7490
cohere/embed-multilingual-v3 0.7402
OrdalieTech/Solon-embeddings-base-0.1 0.7306
openai/ada-002 0.7290
cohere/embed-multilingual-light-v3 0.6945
antoinelouis/biencoder-camembert-base-mmarcoFR 0.6826
dangvantuan/sentence-camembert-large 0.6756
voyage/voyage-01 0.6753
intfloat/multilingual-e5-large 0.6660
intfloat/multilingual-e5-base 0.6597
Sbert/paraphrase-multilingual-mpnet-base-v2 0.5975
dangvantuan/sentence-camembert-base 0.5456
EuropeanParliament/eubert_embedding_v1 0.5063

These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :

  • AmazonReviewsClassification (MTEB)
  • MassiveIntentClassification (MTEB)
  • MassiveScenarioClassification (MTEB)
  • MTOPDomainClassification (MTEB)
  • MTOPIntentClassification (MTEB)
  • STS22 (MTEB)
  • MiraclFRRerank (Miracl)
  • OrdalieFRSTS (Ordalie)
  • OrdalieFRReranking (Ordalie)

We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.

(evaluation script available here : github.com/OrdalieTech/mteb)