--- base_model: aubmindlab/bert-base-arabertv02 datasets: - akhooli/arabic-triplets-1m-curated-sims-len language: - ar library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - transformers.js - transformers - sentence-similarity - feature-extraction - dataset_size:75000 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss - mteb model-index: - name: Omartificial-Intelligence-Space/Arabert-matro-v4 results: - dataset: config: ar name: MTEB MintakaRetrieval (ar) revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e split: test type: mintaka/mmteb-mintaka metrics: - type: main_score value: 20.059 - type: map_at_1 value: 11.439 - type: map_at_3 value: 14.927 - type: map_at_5 value: 15.984 - type: map_at_10 value: 16.82 - type: ndcg_at_1 value: 11.439 - type: ndcg_at_3 value: 16.081 - type: ndcg_at_5 value: 18.007 - type: ndcg_at_10 value: 20.059 - type: recall_at_1 value: 11.439 - type: recall_at_3 value: 19.428 - type: recall_at_5 value: 24.149 - type: recall_at_10 value: 30.549 - type: precision_at_1 value: 11.439 - type: precision_at_3 value: 6.476 - type: precision_at_5 value: 4.83 - type: precision_at_10 value: 3.055 - type: mrr_at_1 value: 11.4389 - type: mrr_at_3 value: 14.9266 - type: mrr_at_5 value: 15.9843 - type: mrr_at_10 value: 16.8196 task: type: Retrieval - dataset: config: ar name: MTEB MIRACLRetrievalHardNegatives (ar) revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb split: dev type: miracl/mmteb-miracl-hardnegatives metrics: - type: main_score value: 62.616 - type: map_at_1 value: 35.994 - type: map_at_3 value: 48.586 - type: map_at_5 value: 51.919 - type: map_at_10 value: 54.46 - type: ndcg_at_1 value: 53.6 - type: ndcg_at_3 value: 55.553 - type: ndcg_at_5 value: 58.681 - type: ndcg_at_10 value: 62.616 - type: recall_at_1 value: 35.994 - type: recall_at_3 value: 55.932 - type: recall_at_5 value: 64.78 - type: recall_at_10 value: 75.128 - type: precision_at_1 value: 53.6 - type: precision_at_3 value: 31.9 - type: precision_at_5 value: 23.12 - type: precision_at_10 value: 13.94 - type: mrr_at_1 value: 53.6 - type: mrr_at_3 value: 62.5 - type: mrr_at_5 value: 64.105 - type: mrr_at_10 value: 64.9363 task: type: Retrieval - dataset: config: ar name: MTEB MLQARetrieval (ar) revision: 397ed406c1a7902140303e7faf60fff35b58d285 split: validation type: mlqa/mmteb-mlqa metrics: - type: main_score value: 67.56 - type: map_at_1 value: 54.352 - type: map_at_3 value: 60.993 - type: map_at_5 value: 62.173 - type: map_at_10 value: 63.11 - type: ndcg_at_1 value: 54.352 - type: ndcg_at_3 value: 63.162 - type: ndcg_at_5 value: 65.301 - type: ndcg_at_10 value: 67.56 - type: recall_at_1 value: 54.352 - type: recall_at_3 value: 69.439 - type: recall_at_5 value: 74.662 - type: recall_at_10 value: 81.625 - type: precision_at_1 value: 54.352 - type: precision_at_3 value: 23.146 - type: precision_at_5 value: 14.932 - type: precision_at_10 value: 8.162 - type: mrr_at_1 value: 54.352 - type: mrr_at_3 value: 60.9929 - type: mrr_at_5 value: 62.1728 - type: mrr_at_10 value: 63.1095 task: type: Retrieval - dataset: config: default name: MTEB SadeemQuestionRetrieval (ar) revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9 split: test type: sadeem/mmteb-sadeem metrics: - type: main_score value: 64.662 - type: map_at_1 value: 29.584 - type: map_at_3 value: 53.75 - type: map_at_5 value: 54.643 - type: map_at_10 value: 54.943 - type: ndcg_at_1 value: 29.584 - type: ndcg_at_3 value: 62.35 - type: ndcg_at_5 value: 63.943 - type: ndcg_at_10 value: 64.662 - type: recall_at_1 value: 29.584 - type: recall_at_3 value: 87.458 - type: recall_at_5 value: 91.288 - type: recall_at_10 value: 93.49 - type: precision_at_1 value: 29.584 - type: precision_at_3 value: 29.153 - type: precision_at_5 value: 18.258 - type: precision_at_10 value: 9.349 - type: mrr_at_1 value: 26.9507 - type: mrr_at_3 value: 52.0105 - type: mrr_at_5 value: 52.9344 - type: mrr_at_10 value: 53.2895 task: type: Retrieval - dataset: config: ar-ar name: MTEB STS17 (ar-ar) revision: faeb762787bd10488a50c8b5be4a3b82e411949c split: test type: mteb/sts17-crosslingual-sts metrics: - type: cosine_pearson value: 84.66883392015258 - type: cosine_spearman value: 85.30520907141938 - type: euclidean_pearson value: 82.04306779342852 - type: euclidean_spearman value: 84.58744201847996 - type: main_score value: 85.30520907141938 - type: manhattan_pearson value: 82.08829357724328 - type: manhattan_spearman value: 84.49254541383544 task: type: STS license: apache-2.0 --- # Arabic-Triplet-Matryoshka-V2-Model - This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). - It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. - This model is trained on 1M samples from the [akhooli/arabic-triplets-1m-curated-sims-len](https://huggingface.co/datasets/akhooli/arabic-triplets-1m-curated-sims-len) dataset. - Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss). ```markdown ## Citation If you use the Arabic Matryoshka Embeddings Model, please cite it as follows: @misc{nacar2024enhancingsemanticsimilarityunderstanding, title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, author={Omer Nacar and Anis Koubaa}, year={2024}, eprint={2407.21139}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.21139}, }