sergeyzh commited on
Commit
055975b
·
verified ·
1 Parent(s): 754d90e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -22,7 +22,7 @@ base_model: cointegrated/LaBSE-en-ru
22
 
23
  ---
24
 
25
- Модель BERT для расчетов эмбедингов предложений на русском языке. Модель основана на [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) - имеет аналогичные размеры контекста (512), ембединга (768) и быстродействие.
26
 
27
 
28
  ## Использование:
@@ -60,3 +60,38 @@ print(util.dot_score(embeddings, embeddings))
60
  | cointegrated/LaBSE-en-ru | 0.794 | 0.659 | 0.431 | 0.761 | 0.946 | 0.766 | 0.789 | 0.769 | 0.340 | 0.414 |
61
 
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ---
24
 
25
+ Модель BERT для расчетов эмбеддингов предложений на русском языке. Модель основана на [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) - имеет аналогичные размеры контекста (512), ембеддинга (768) и быстродействие.
26
 
27
 
28
  ## Использование:
 
60
  | cointegrated/LaBSE-en-ru | 0.794 | 0.659 | 0.431 | 0.761 | 0.946 | 0.766 | 0.789 | 0.769 | 0.340 | 0.414 |
61
 
62
 
63
+ Оценки модели на бенчмарке [ruMTEB](https://habr.com/ru/companies/sberdevices/articles/831150/):
64
+
65
+ |Model Name | Metric | sbert_large_ mt_nlu_ru | sbert_large_ nlu_ru | [LaBSE-ru-sts](https://huggingface.co/sergeyzh/LaBSE-ru-sts) | LaBSE-ru-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large |
66
+ |:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|---------------------:|----------------------:|
67
+ |CEDRClassification | Accuracy | 0.368 | 0.358 | 0.418 | 0.451 | 0.401 | 0.423 | **0.448** |
68
+ |GeoreviewClassification | Accuracy | 0.397 | 0.400 | 0.406 | 0.438 | 0.447 | 0.461 | **0.497** |
69
+ |GeoreviewClusteringP2P | V-measure | 0.584 | 0.590 | 0.626 | **0.644** | 0.586 | 0.545 | 0.605 |
70
+ |HeadlineClassification | Accuracy | 0.772 | **0.793** | 0.633 | 0.688 | 0.732 | 0.757 | 0.758 |
71
+ |InappropriatenessClassification | Accuracy | **0.646** | 0.625 | 0.599 | 0.615 | 0.592 | 0.588 | 0.616 |
72
+ |KinopoiskClassification | Accuracy | 0.503 | 0.495 | 0.496 | 0.521 | 0.500 | 0.509 | **0.566** |
73
+ |RiaNewsRetrieval | NDCG@10 | 0.214 | 0.111 | 0.651 | 0.694 | 0.700 | 0.702 | **0.807** |
74
+ |RuBQReranking | MAP@10 | 0.561 | 0.468 | 0.688 | 0.687 | 0.715 | 0.720 | **0.756** |
75
+ |RuBQRetrieval | NDCG@10 | 0.298 | 0.124 | 0.622 | 0.657 | 0.685 | 0.696 | **0.741** |
76
+ |RuReviewsClassification | Accuracy | 0.589 | 0.583 | 0.599 | 0.632 | 0.612 | 0.630 | **0.653** |
77
+ |RuSTSBenchmarkSTS | Pearson correlation | 0.712 | 0.588 | 0.788 | 0.822 | 0.781 | 0.796 | **0.831** |
78
+ |RuSciBenchGRNTIClassification | Accuracy | 0.542 | 0.539 | 0.529 | 0.569 | 0.550 | 0.563 | **0.582** |
79
+ |RuSciBenchGRNTIClusteringP2P | V-measure | **0.522** | 0.504 | 0.486 | 0.517 | 0.511 | 0.516 | 0.520 |
80
+ |RuSciBenchOECDClassification | Accuracy | 0.438 | 0.430 | 0.406 | 0.440 | 0.427 | 0.423 | **0.445** |
81
+ |RuSciBenchOECDClusteringP2P | V-measure | **0.473** | 0.464 | 0.426 | 0.452 | 0.443 | 0.448 | 0.450 |
82
+ |SensitiveTopicsClassification | Accuracy | **0.285** | 0.280 | 0.262 | 0.272 | 0.228 | 0.234 | 0.257 |
83
+ |TERRaClassification | Average Precision | 0.520 | 0.502 | **0.587** | 0.585 | 0.551 | 0.550 | 0.584 |
84
+
85
+ |Model Name | Metric | sbert_large_ mt_nlu_ru | sbert_large_ nlu_ru | [LaBSE-ru-sts](https://huggingface.co/sergeyzh/LaBSE-ru-sts) | LaBSE-ru-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large |
86
+ |:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|----------------------:|---------------------:|
87
+ |Classification | Accuracy | 0.554 | 0.552 | 0.524 | 0.558 | 0.551 | 0.561 | **0.588** |
88
+ |Clustering | V-measure | 0.526 | 0.519 | 0.513 | **0.538** | 0.513 | 0.503 | 0.525 |
89
+ |MultiLabelClassification | Accuracy | 0.326 | 0.319 | 0.340 | **0.361** | 0.314 | 0.329 | 0.353 |
90
+ |PairClassification | Average Precision | 0.520 | 0.502 | 0.587 | **0.585** | 0.551 | 0.550 | 0.584 |
91
+ |Reranking | MAP@10 | 0.561 | 0.468 | 0.688 | 0.687 | 0.715 | 0.720 | **0.756** |
92
+ |Retrieval | NDCG@10 | 0.256 | 0.118 | 0.637 | 0.675 | 0.697 | 0.699 | **0.774** |
93
+ |STS | Pearson correlation | 0.712 | 0.588 | 0.788 | 0.822 | 0.781 | 0.796 | **0.831** |
94
+ |Average | Average | 0.494 | 0.438 | 0.582 | 0.604 | 0.588 | 0.594 | **0.630** |
95
+
96
+
97
+