File size: 2,580 Bytes
bf37354
ccc8d8a
 
 
 
a507a54
94b95f8
 
 
 
 
 
 
 
 
 
ccc8d8a
 
a507a54
ccc8d8a
3fab160
ccc8d8a
a7eb61c
ccc8d8a
 
 
 
 
 
 
3fab160
 
 
ccc8d8a
 
 
 
 
 
 
592738d
ccc8d8a
592738d
ccc8d8a
592738d
a507a54
592738d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ccc8d8a
 
94b95f8
ccc8d8a
94b95f8
 
 
a7eb61c
3b5e5b3
a7eb61c
ccc8d8a
 
 
 
 
 
 
 
94b95f8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- SbertDistil
license: apache-2.0
datasets:
- wikimedia/wikipedia
- SiberiaSoft/SiberianPersonaChat-2
language:
- ru
- en
metrics:
- mse
library_name: transformers
---

# FractalGPT/SbertDistil


This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
This is a fast and small model for solving the problem of determining the proximity between sentences, in the future we will reduce and speed it up. [Project](https://github.com/FractalGPT/ModelEmbedderDistillation)

<!--- Describe your model here -->

## Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

* [Run example in Collab](https://colab.research.google.com/drive/1m3fyh632htPs9UiEu4_AkQfrUtjDqIQq)

  
```
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
import numpy as np
from sentence_transformers import SentenceTransformer
```

```python
model = SentenceTransformer('FractalGPT/SbertDistil')

def cos(x, y):
  return np.dot(x, y)/(np.linalg.norm(x)*np.linalg.norm(y))
```

```python
text_1 = "Кто такой большой кот?"
text_2 = "Who is kitty?"
a = model.encode(text_1)
b = model.encode(text_2)
cos(a, b)
```

```
>>> 0.8072159157330788
```

## Training

* The original weights was taken from [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
* Training was conducted in two stages:
1. In the first stage, the model was trained on Wikipedia texts (4 million texts) for three epochs.
   <img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/1_st_en.JPG?raw=true" width=700 />
3. In the second stage, training was conducted on Wikipedia and dialog dataset for one epoch.
   <img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/2_st_en.JPG?raw=true" width=700 />

## Full Model Architecture
```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
```