FractalGPT
/

SbertDistil

Feature Extraction

sentence-transformers

sentence-similarity

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

SbertDistil / README.md

Ponimash's picture

Update README.md

a7eb61c verified 10 months ago

|

history blame contribute delete

2.58 kB

	---
	tags:
	- sentence-transformers
	- feature-extraction
	- sentence-similarity
	- SbertDistil
	license: apache-2.0
	datasets:
	- wikimedia/wikipedia
	- SiberiaSoft/SiberianPersonaChat-2
	language:
	- ru
	- en
	metrics:
	- mse
	library_name: transformers
	---

	# FractalGPT/SbertDistil


	This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
	This is a fast and small model for solving the problem of determining the proximity between sentences, in the future we will reduce and speed it up. [Project](https://github.com/FractalGPT/ModelEmbedderDistillation)

	<!--- Describe your model here -->

	## Usage (Sentence-Transformers)

	Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

	* [Run example in Collab](https://colab.research.google.com/drive/1m3fyh632htPs9UiEu4_AkQfrUtjDqIQq)


	```
	pip install -U sentence-transformers
	```

	Then you can use the model like this:

	```python
	import numpy as np
	from sentence_transformers import SentenceTransformer
	```

	```python
	model = SentenceTransformer('FractalGPT/SbertDistil')

	def cos(x, y):
	return np.dot(x, y)/(np.linalg.norm(x)*np.linalg.norm(y))
	```

	```python
	text_1 = "Кто такой большой кот?"
	text_2 = "Who is kitty?"
	a = model.encode(text_1)
	b = model.encode(text_2)
	cos(a, b)
	```

	```
	>>> 0.8072159157330788
	```

	## Training

	* The original weights was taken from [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
	* Training was conducted in two stages:
	1. In the first stage, the model was trained on Wikipedia texts (4 million texts) for three epochs.
	<img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/1_st_en.JPG?raw=true" width=700 />
	3. In the second stage, training was conducted on Wikipedia and dialog dataset for one epoch.
	<img src="https://github.com/FractalGPT/ModelEmbedderDistillation/blob/main/DistilSBERT/Train/2_st_en.JPG?raw=true" width=700 />

	## Full Model Architecture
	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
	(2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
	)
	```