File size: 1,158 Bytes
5b77b2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ada21c
 
 
 
5b77b2c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
language: zh
tags:
- sbert
datasets:
- dialogue
---

# Data
train data is similarity sentence data from E-commerce dialogue, about 50w sentence pairs.

## Model
model created by [sentence-tansformers](https://www.sbert.net/index.html),model struct is bi-encoder

### Usage
```python
>>> from sentence_transformers import SentenceTransformer, util
>>> model = SentenceTransformer("tuhailong/bi_encoder_roberta-wwm-ext", device="cuda:1") 
>>> model.max_seq_length=32
>>> sentences = ["今天天气不错", "今天心情不错"]
>>> embeddings1 = model.encode([sentences[0]], convert_to_tensor=True)
>>> embeddings2 = model.encode([sentences[1]], convert_to_tensor=True)
>>> scores = util.cos_sim(embeddings1, embeddings2).cpu().numpy()
>>> print(scores)
```

#### Code
train code from https://github.com/TTurn/bi-encoder

##### PS
Because add the pooling layer and dense layer after model,has folders in model files. So here will
be additional files "1_Pooling-config.json", "2_Dense-config.json" and "2_Dense-pytorch_model.bin".
after download these files, rename them as "1_Pooling/config.json", "2_Dense/config.json" and "2_Dense/pytorch_model.bin".