Kirill Gelvan
commited on
Commit
•
505f6f7
1
Parent(s):
a0f2756
add some descriptions
Browse files
README.md
CHANGED
@@ -5,11 +5,15 @@ tags:
|
|
5 |
---
|
6 |
### Description
|
7 |
|
|
|
8 |
|
9 |
-
### Inference
|
10 |
|
11 |
-
|
|
|
12 |
|
|
|
|
|
|
|
13 |
def get_length_param(text: str, tokenizer) -> str:
|
14 |
tokens_count = len(tokenizer.encode(text))
|
15 |
if tokens_count <= 15:
|
|
|
5 |
---
|
6 |
### Description
|
7 |
|
8 |
+
DialoGPT trained on Russian language and fine tuned on my telegram chat.
|
9 |
|
|
|
10 |
|
11 |
+
This model was created by [sberbank-ai](https://hf.co/sberbank-ai) and trained on Russian forums (see [Grossmend's model](https://hf.co/Grossmend/rudialogpt3_medium_based_on_gpt2)). You can find info about how it has been trained on [habr](https://habr.com/ru/company/icl_services/blog/548244/) (in Russian). I have created a **simple pipeline** and **fine tuned** that model on my own **exported telegram chat** (~30mb json). It is in fact very easy to get the data from telegram and fine tune a model. Therefore, I made a **colab tutorial** for it: link
|
12 |
+
|
13 |
|
14 |
+
### How to use
|
15 |
+
|
16 |
+
```python
|
17 |
def get_length_param(text: str, tokenizer) -> str:
|
18 |
tokens_count = len(tokenizer.encode(text))
|
19 |
if tokens_count <= 15:
|