File size: 2,580 Bytes
bf37354 ccc8d8a a507a54 94b95f8 ccc8d8a a507a54 ccc8d8a 3fab160 ccc8d8a a7eb61c ccc8d8a 3fab160 ccc8d8a 592738d ccc8d8a 592738d ccc8d8a 592738d a507a54 592738d ccc8d8a 94b95f8 ccc8d8a 94b95f8 a7eb61c 3b5e5b3 a7eb61c ccc8d8a 94b95f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- SbertDistil
license: apache-2.0
datasets:
- wikimedia/wikipedia
- SiberiaSoft/SiberianPersonaChat-2
language:
- ru
- en
metrics:
- mse
library_name: transformers
---
# FractalGPT/SbertDistil
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
This is a fast and small model for solving the problem of determining the proximity between sentences, in the future we will reduce and speed it up. [Project](https://github.com/FractalGPT/ModelEmbedderDistillation)
<!--- Describe your model here -->
## Usage (Sentence-Transformers)
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
* [Run example in Collab](https://colab.research.google.com/drive/1m3fyh632htPs9UiEu4_AkQfrUtjDqIQq)
```
pip install -U sentence-transformers
```
Then you can use the model like this:
```python
import numpy as np
from sentence_transformers import SentenceTransformer
```
```python
model = SentenceTransformer('FractalGPT/SbertDistil')
def cos(x, y):
return np.dot(x, y)/(np.linalg.norm(x)*np.linalg.norm(y))
```
```python
text_1 = "Кто такой большой кот?"
text_2 = "Who is kitty?"
a = model.encode(text_1)
b = model.encode(text_2)
cos(a, b)
```
```
>>> 0.8072159157330788
```
## Training
* The original weights was taken from [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
* Training was conducted in two stages:
1. In the first stage, the model was trained on Wikipedia texts (4 million texts) for three epochs.
<img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/1_st_en.JPG?raw=true" width=700 />
3. In the second stage, training was conducted on Wikipedia and dialog dataset for one epoch.
<img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/2_st_en.JPG?raw=true" width=700 />
## Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
(2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
``` |