hellonlp commited on
Commit
8fdbd34
·
verified ·
1 Parent(s): f0dd34a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - zh
5
+ pipeline_tag: sentence-similarity
6
+ ---
7
+
8
+
9
+ ## Model List
10
+ The evaluation dataset is in Chinese, and we used the same language model **RoBERTa base** on different methods. In addition, considering that the test set of some datasets is small, which may lead to a large deviation in evaluation accuracy, the evaluation data here uses train, valid and test at the same time, and the final evaluation result adopts the **weighted average (w-avg)** method.
11
+ | Model | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
12
+ |:-----------------------:|:------------:|:-----------:|:----------|:-------------|:------------:|:----------:|
13
+ | BERT-Whitening | 65.27| -| -| -| -| -|
14
+ | SimBERT | 70.01| -| -| -| -| -|
15
+ | SBERT-Whitening | 71.75| -| -| -| -| -|
16
+ | [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh) | 78.61| -| -| -| -| -|
17
+ | [hellonlp/simcse-base-zh](https://huggingface.co/hellonlp/simcse-roberta-base-zh) | 80.96| -| -| -| -| -|
18
+ | [hellonlp/promcse-base-zh-v1.0](https://huggingface.co/hellonlp/promcse-bert-base-zh) | **81.57**| -| -| -| -| -|
19
+ | [hellonlp/promcse-base-zh-v1.1](https://huggingface.co/hellonlp/promcse-bert-base-zh) | **82.02**| -| -| -| -| -|
20
+
21
+
22
+ ## Uses
23
+ To use the tool, first install the `promcse` package from [PyPI](https://pypi.org/project/promcse/)
24
+ ```bash
25
+ pip install promcse
26
+ ```
27
+
28
+ After installing the package, you can load our model by two lines of code
29
+ ```python
30
+ from promcse import PromCSE
31
+ model = PromCSE("hellonlp/promcse-bert-base-zh", "cls", 10)
32
+ ```
33
+
34
+ Then you can use our model for encoding sentences into embeddings
35
+ ```python
36
+ embeddings = model.encode("武汉是一个美丽的城市。")
37
+ print(embeddings.shape)
38
+ #torch.Size([768])
39
+ ```
40
+
41
+ Compute the cosine similarities between two groups of sentences
42
+ ```python
43
+ sentences_a = ['你好吗']
44
+ sentences_b = ['你怎么样','我吃了一个苹果','你过的好吗','你还好吗','你',
45
+ '你好不好','你好不好呢','我不开心','我好开心啊', '你吃饭了吗',
46
+ '你好吗','你现在好吗','你好个鬼']
47
+ similarities = model.similarity(sentences_a, sentences_b)
48
+ print(similarities)
49
+ # [(1.0, '你好吗'),
50
+ # (0.9029, '你好不好'),
51
+ # (0.8945, '你好不好呢'),
52
+ # (0.8478, '你还好吗'),
53
+ # (0.7746, '你现在好吗'),
54
+ # (0.7607, '你过的好吗'),
55
+ # (0.7399, '你怎么样'),
56
+ # (0.5967, '你'),
57
+ # (0.5395, '你好个鬼'),
58
+ # (0.5262, '你吃饭了吗'),
59
+ # (0.3608, '我好开心啊'),
60
+ # (0.2308, '我不开心'),
61
+ # (0.0626, '我吃了一个苹果')]
62
+ ```
63
+