This is the detoxification baseline model trained on the train part of "RUSSE 2022: Russian Text Detoxification Based on Parallel Corpora" competition. The source sentences are Russian toxic messages from Odnoklassniki, Pikabu, and Twitter platforms. The base model is ruT5.
How to use
from transformers import T5ForConditionalGeneration, AutoTokenizer
base_model_name = 'ai-forever/ruT5-base'
model_name = 's-nlp/ruT5-base-detox'
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
input_ids = tokenizer.encode('Это полная хуйня!', return_tensors='pt')
output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output_text)
# Это полный бред!
Citation
@article{dementievarusse,
title={RUSSE-2022: Findings of the First Russian Detoxification Shared Task Based on Parallel Corpora},
author={Dementieva, Daryna and Logacheva, Varvara and Nikishina, Irina and Fenogenova, Alena and Dale, David and Krotova, Irina and Semenov, Nikita and Shavrina, Tatiana and Panchenko, Alexander}
}
License
This model is licensed under the OpenRAIL++ License, which supports the development of various technologies—both industrial and academic—that serve the public good.
- Downloads last month
- 80
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for s-nlp/ruT5-base-detox
Base model
ai-forever/ruT5-base