Kirili4ik
/

ruDialoGpt3-medium-finetuned-telegram

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kirill Gelvan commited on Oct 23, 2021

Commit

0b9e0db

•

1 Parent(s): 505f6f7

add emoji and some code

Files changed (1) hide show

README.md +13 -2

README.md CHANGED Viewed

@@ -3,17 +3,27 @@ language: ru
 tags:
 - conversational
 ---
-### Description
 DialoGPT trained on Russian language and fine tuned on my telegram chat.
 This model was created by [sberbank-ai](https://hf.co/sberbank-ai) and trained on Russian forums (see [Grossmend's model](https://hf.co/Grossmend/rudialogpt3_medium_based_on_gpt2)). You can find info about how it has been trained on [habr](https://habr.com/ru/company/icl_services/blog/548244/) (in Russian). I have created a **simple pipeline** and **fine tuned** that model on my own **exported telegram chat** (~30mb json). It is in fact very easy to get the data from telegram and fine tune a model. Therefore, I made a **colab tutorial** for it: link
-### How to use
 ```python
 def get_length_param(text: str, tokenizer) -> str:
     tokens_count = len(tokenizer.encode(text))
     if tokens_count <= 15:
@@ -27,6 +37,7 @@ def get_length_param(text: str, tokenizer) -> str:
     return len_param
 def get_user_param(text: dict, machine_name_in_chat: str) -> str:
     if text['from'] == machine_name_in_chat:
         return '1'  # machine

 tags:
 - conversational
 ---
+### 📝 Description
 DialoGPT trained on Russian language and fine tuned on my telegram chat.
 This model was created by [sberbank-ai](https://hf.co/sberbank-ai) and trained on Russian forums (see [Grossmend's model](https://hf.co/Grossmend/rudialogpt3_medium_based_on_gpt2)). You can find info about how it has been trained on [habr](https://habr.com/ru/company/icl_services/blog/548244/) (in Russian). I have created a **simple pipeline** and **fine tuned** that model on my own **exported telegram chat** (~30mb json). It is in fact very easy to get the data from telegram and fine tune a model. Therefore, I made a **colab tutorial** for it: link
+⚠️ Due to specifics of the data Hosted inference API may not work properly ⚠️
+### ❓ How to use
 ```python
+# Download model and tokenizer
+checkpoint = "Kirili4ik/ruDialoGpt3-medium-finetuned-telegram"
+tokenizer =  AutoTokenizer.from_pretrained(checkpoint)
+model = AutoModelForCausalLM.from_pretrained(checkpoint)
+model.eval()
+# util function to get expected len after tokenizing
 def get_length_param(text: str, tokenizer) -> str:
     tokens_count = len(tokenizer.encode(text))
     if tokens_count <= 15:
     return len_param
+# util function to get next person number (1/0) for Machine or Human in the dialogue
 def get_user_param(text: dict, machine_name_in_chat: str) -> str:
     if text['from'] == machine_name_in_chat:
         return '1'  # machine