Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,63 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
pipeline_tag: sentence-similarity
|
6 |
+
---
|
7 |
+
|
8 |
+
|
9 |
+
## Model List
|
10 |
+
The evaluation dataset is in Chinese, and we used the same language model **RoBERTa base** on different methods. In addition, considering that the test set of some datasets is small, which may lead to a large deviation in evaluation accuracy, the evaluation data here uses train, valid and test at the same time, and the final evaluation result adopts the **weighted average (w-avg)** method.
|
11 |
+
| Model | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
|
12 |
+
|:-----------------------:|:------------:|:-----------:|:----------|:-------------|:------------:|:----------:|
|
13 |
+
| BERT-Whitening | 65.27| -| -| -| -| -|
|
14 |
+
| SimBERT | 70.01| -| -| -| -| -|
|
15 |
+
| SBERT-Whitening | 71.75| -| -| -| -| -|
|
16 |
+
| [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 78.61| -| -| -| -| -|
|
17 |
+
| [hellonlp/simcse-base-zh](https://huggingface.co/hellonlp/simcse-roberta-base-zh) | 80.96| -| -| -| -| -|
|
18 |
+
| [hellonlp/promcse-base-zh-v1.0](https://huggingface.co/hellonlp/promcse-bert-base-zh) | **81.57**| -| -| -| -| -|
|
19 |
+
| [hellonlp/promcse-base-zh-v1.1](https://huggingface.co/hellonlp/promcse-bert-base-zh) | **82.02**| -| -| -| -| -|
|
20 |
+
|
21 |
+
|
22 |
+
## Uses
|
23 |
+
To use the tool, first install the `promcse` package from [PyPI](https://pypi.org/project/promcse/)
|
24 |
+
```bash
|
25 |
+
pip install promcse
|
26 |
+
```
|
27 |
+
|
28 |
+
After installing the package, you can load our model by two lines of code
|
29 |
+
```python
|
30 |
+
from promcse import PromCSE
|
31 |
+
model = PromCSE("hellonlp/promcse-bert-base-zh", "cls", 10)
|
32 |
+
```
|
33 |
+
|
34 |
+
Then you can use our model for encoding sentences into embeddings
|
35 |
+
```python
|
36 |
+
embeddings = model.encode("武汉是一个美丽的城市。")
|
37 |
+
print(embeddings.shape)
|
38 |
+
#torch.Size([768])
|
39 |
+
```
|
40 |
+
|
41 |
+
Compute the cosine similarities between two groups of sentences
|
42 |
+
```python
|
43 |
+
sentences_a = ['你好吗']
|
44 |
+
sentences_b = ['你怎么样','我吃了一个苹果','你过的好吗','你还好吗','你',
|
45 |
+
'你好不好','你好不好呢','我不开心','我好开心啊', '你吃饭了吗',
|
46 |
+
'你好吗','你现在好吗','你好个鬼']
|
47 |
+
similarities = model.similarity(sentences_a, sentences_b)
|
48 |
+
print(similarities)
|
49 |
+
# [(1.0, '你好吗'),
|
50 |
+
# (0.9029, '你好不好'),
|
51 |
+
# (0.8945, '你好不好呢'),
|
52 |
+
# (0.8478, '你还好吗'),
|
53 |
+
# (0.7746, '你现在好吗'),
|
54 |
+
# (0.7607, '你过的好吗'),
|
55 |
+
# (0.7399, '你怎么样'),
|
56 |
+
# (0.5967, '你'),
|
57 |
+
# (0.5395, '你好个鬼'),
|
58 |
+
# (0.5262, '你吃饭了吗'),
|
59 |
+
# (0.3608, '我好开心啊'),
|
60 |
+
# (0.2308, '我不开心'),
|
61 |
+
# (0.0626, '我吃了一个苹果')]
|
62 |
+
```
|
63 |
+
|