Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: zh
|
3 |
+
tags:
|
4 |
+
- sbert
|
5 |
+
datasets:
|
6 |
+
- dialogue
|
7 |
+
---
|
8 |
+
|
9 |
+
# Data
|
10 |
+
train data is similarity sentence data from E-commerce dialogue, about 50w sentence pairs.
|
11 |
+
|
12 |
+
## Model
|
13 |
+
model created by [sentence-tansformers](https://www.sbert.net/index.html),model struct is bi-encoder
|
14 |
+
|
15 |
+
### Usage
|
16 |
+
```python
|
17 |
+
>>> from sentence_transformers import SentenceTransformer, util
|
18 |
+
>>> model = SentenceTransformer("tuhailong/bi_encoder_roberta-wwm-ext", device="cuda:1")
|
19 |
+
>>> model.max_seq_length=32
|
20 |
+
>>> sentences = ["今天天气不错", "今天心情不错"]
|
21 |
+
>>> embeddings1 = model.encode([sentences[0]], convert_to_tensor=True)
|
22 |
+
>>> embeddings2 = model.encode([sentences[1]], convert_to_tensor=True)
|
23 |
+
>>> scores = util.cos_sim(embeddings1, embeddings2).cpu().numpy()
|
24 |
+
>>> print(scores)
|
25 |
+
```
|
26 |
+
|
27 |
+
### PS
|
28 |
+
Because add the pooling layer and dense layer after model,has folders in model files. So here will
|
29 |
+
be additional files "1_Pooling-config.json", "2_Dense-config.json" and "2_Dense-pytorch_model.bin".
|
30 |
+
after download these files, rename them as "1_Pooling/config.json", "2_Dense/config.json" and "2_Dense/pytorch_model.bin".
|