dardem commited on
Commit
ab1138c
1 Parent(s): 763d29d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -6,16 +6,37 @@ tags:
6
  - text-generation-inference
7
  datasets:
8
  - s-nlp/ru_paradetox
 
 
9
  ---
10
- This is the detoxification baseline model trained on the [train](https://github.com/skoltech-nlp/russe_detox_2022/blob/main/data/input/train.tsv) part of "RUSSE 2022: Russian Text Detoxification Based on Parallel Corpora" competition. The source sentences are Russian toxic messages from Odnoklassniki, Pikabu, and Twitter platforms. The base model is [ruT5](https://huggingface.co/sberbank-ai/ruT5-base) provided from Sber.
11
 
12
  **How to use**
13
  ```python
14
  from transformers import T5ForConditionalGeneration, AutoTokenizer
15
 
16
- base_model_name = 'sberbank-ai/ruT5-base'
17
- model_name = 'SkolkovoInstitute/ruT5-base-detox'
18
 
19
  tokenizer = AutoTokenizer.from_pretrained(base_model_name)
20
  model = T5ForConditionalGeneration.from_pretrained(model_name)
21
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - text-generation-inference
7
  datasets:
8
  - s-nlp/ru_paradetox
9
+ base_model:
10
+ - ai-forever/ruT5-base
11
  ---
12
+ This is the detoxification baseline model trained on the [train](https://github.com/skoltech-nlp/russe_detox_2022/blob/main/data/input/train.tsv) part of "RUSSE 2022: Russian Text Detoxification Based on Parallel Corpora" competition. The source sentences are Russian toxic messages from Odnoklassniki, Pikabu, and Twitter platforms. The base model is [ruT5](https://huggingface.co/ai-forever/ruT5-base).
13
 
14
  **How to use**
15
  ```python
16
  from transformers import T5ForConditionalGeneration, AutoTokenizer
17
 
18
+ base_model_name = 'ai-forever/ruT5-base'
19
+ model_name = 's-nlp/ruT5-base-detox'
20
 
21
  tokenizer = AutoTokenizer.from_pretrained(base_model_name)
22
  model = T5ForConditionalGeneration.from_pretrained(model_name)
23
+
24
+ input_ids = tokenizer.encode('Это полная хуйня!', return_tensors='pt')
25
+ output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
26
+ output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
27
+ print(output_text)
28
+ # Это полный бред!
29
+ ```
30
+
31
+ ## Citation
32
+
33
+ ```
34
+ @article{dementievarusse,
35
+ title={RUSSE-2022: Findings of the First Russian Detoxification Shared Task Based on Parallel Corpora},
36
+ author={Dementieva, Daryna and Logacheva, Varvara and Nikishina, Irina and Fenogenova, Alena and Dale, David and Krotova, Irina and Semenov, Nikita and Shavrina, Tatiana and Panchenko, Alexander}
37
+ }
38
+ ```
39
+
40
+ **License**
41
+
42
+ This model is licensed under the OpenRAIL++ License, which supports the development of various technologies—both industrial and academic—that serve the public good.