sbert
#5
by
TANHL
- opened
README.md
CHANGED
@@ -8,31 +8,18 @@ tags:
|
|
8 |
- transformers
|
9 |
license: apache-2.0
|
10 |
widget:
|
11 |
-
|
12 |
-
|
13 |
-
-
|
14 |
-
-
|
15 |
-
-
|
16 |
---
|
17 |
|
18 |
# Chinese Sentence BERT
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
-
This is the sentence embedding model pre-trained by [UER-py](https://github.com/dbiir/UER-py/), which is introduced in [this paper](https://arxiv.org/abs/1909.05658).
|
23 |
-
|
24 |
-
## How to use
|
25 |
-
|
26 |
-
You can use this model to extract sentence embeddings for sentence similarity task. We use cosine distance to calculate the embedding similarity here:
|
27 |
-
|
28 |
-
```python
|
29 |
-
>>> from sentence_transformers import SentenceTransformer
|
30 |
-
>>> model = SentenceTransformer('uer/sbert-base-chinese-nli')
|
31 |
-
>>> sentences = ['那个人很开心', '那个人非常开心']
|
32 |
-
>>> sentence_embeddings = model.encode(sentences)
|
33 |
-
>>> from sklearn.metrics.pairwise import paired_cosine_distances
|
34 |
-
>>> cosine_score = 1 - paired_cosine_distances([sentence_embeddings[0]],[sentence_embeddings[1]])
|
35 |
-
```
|
36 |
|
37 |
## Training data
|
38 |
|
@@ -68,7 +55,6 @@ python3 scripts/convert_sbert_from_uer_to_huggingface.py --input_model_path mode
|
|
68 |
journal={arXiv preprint arXiv:1908.10084},
|
69 |
year={2019}
|
70 |
}
|
71 |
-
|
72 |
@article{zhao2019uer,
|
73 |
title={UER: An Open-Source Toolkit for Pre-training Models},
|
74 |
author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
|
@@ -76,11 +62,4 @@ python3 scripts/convert_sbert_from_uer_to_huggingface.py --input_model_path mode
|
|
76 |
pages={241},
|
77 |
year={2019}
|
78 |
}
|
79 |
-
|
80 |
-
@article{zhao2023tencentpretrain,
|
81 |
-
title={TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities},
|
82 |
-
author={Zhao, Zhe and Li, Yudong and Hou, Cheng and Zhao, Jing and others},
|
83 |
-
journal={ACL 2023},
|
84 |
-
pages={217},
|
85 |
-
year={2023}
|
86 |
```
|
|
|
8 |
- transformers
|
9 |
license: apache-2.0
|
10 |
widget:
|
11 |
+
source_sentence: "那个人很开心"
|
12 |
+
sentences:
|
13 |
+
- 那个人非常开心
|
14 |
+
- 那只猫很开心
|
15 |
+
- 那个人在吃东西
|
16 |
---
|
17 |
|
18 |
# Chinese Sentence BERT
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
+
This is the sentence embedding model pre-trained by [UER-py](https://github.com/dbiir/UER-py/), which is introduced in [this paper](https://arxiv.org/abs/1909.05658).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Training data
|
25 |
|
|
|
55 |
journal={arXiv preprint arXiv:1908.10084},
|
56 |
year={2019}
|
57 |
}
|
|
|
58 |
@article{zhao2019uer,
|
59 |
title={UER: An Open-Source Toolkit for Pre-training Models},
|
60 |
author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
|
|
|
62 |
pages={241},
|
63 |
year={2019}
|
64 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
```
|