hellonlp
/

promcse-bert-base-zh-v1.1

Sentence Similarity

Model card Files Files and versions Community

hellonlp commited on Nov 26, 2024

Commit

8fdbd34

·

verified ·

1 Parent(s): f0dd34a

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: mit
----

+---
+license: mit
+language:
+- zh
+pipeline_tag: sentence-similarity
+---
+## Model List
+The evaluation dataset is in Chinese, and we used the same language model **RoBERTa base** on different methods.  In addition, considering that the test set of some datasets is small, which may lead to a large deviation in evaluation accuracy, the evaluation data here uses train, valid and test at the same time, and the final evaluation result adopts the **weighted average (w-avg)** method.
+|          Model          | STS-B(w-avg) | ATEC | BQ | LCQMC | PAWSX | Avg. |
+|:-----------------------:|:------------:|:-----------:|:----------|:-------------|:------------:|:----------:|
+|  BERT-Whitening  |  65.27| -| -| -| -| -|
+|  SimBERT   |  70.01| -| -| -| -| -|
+|  SBERT-Whitening  |  71.75| -| -| -| -| -|
+|  [BAAI/bge-base-zh](https://huggingface.co/BAAI/bge-base-zh)  |  78.61| -| -| -| -| -|
+|  [hellonlp/simcse-base-zh](https://huggingface.co/hellonlp/simcse-roberta-base-zh)  |  80.96| -| -| -| -| -|
+|  [hellonlp/promcse-base-zh-v1.0](https://huggingface.co/hellonlp/promcse-bert-base-zh)  |  **81.57**| -| -| -| -| -|
+|  [hellonlp/promcse-base-zh-v1.1](https://huggingface.co/hellonlp/promcse-bert-base-zh)  |  **82.02**| -| -| -| -| -|
+## Uses
+To use the tool, first install the `promcse` package from [PyPI](https://pypi.org/project/promcse/)
+```bash
+pip install promcse
+```
+After installing the package, you can load our model by two lines of code
+```python
+from promcse import PromCSE
+model = PromCSE("hellonlp/promcse-bert-base-zh", "cls", 10)
+```
+Then you can use our model for encoding sentences into embeddings
+```python
+embeddings = model.encode("武汉是一个美丽的城市。")
+print(embeddings.shape)
+#torch.Size([768])
+```
+Compute the cosine similarities between two groups of sentences
+```python
+sentences_a = ['你好吗']
+sentences_b = ['你怎么样','我吃了一个苹果','你过的好吗','你还好吗','你',
+               '你好不好','你好不好呢','我不开心','我好开心啊', '你吃饭了吗',
+               '你好吗','你现在好吗','你好个鬼']
+similarities = model.similarity(sentences_a, sentences_b)
+print(similarities)
+# [(1.0, '你好吗'),
+#  (0.9029, '你好不好'),
+#  (0.8945, '你好不好呢'),
+#  (0.8478, '你还好吗'),
+#  (0.7746, '你现在好吗'),
+#  (0.7607, '你过的好吗'),
+#  (0.7399, '你怎么样'),
+#  (0.5967, '你'),
+#  (0.5395, '你好个鬼'),
+#  (0.5262, '你吃饭了吗'),
+#  (0.3608, '我好开心啊'),
+#  (0.2308, '我不开心'),
+#  (0.0626, '我吃了一个苹果')]
+```