RuBERTConv Toxic Classifier

Model description

Based on rubert-base-cased-conversational model

Intended uses & limitations

How to use

Colab: link

from transformers import pipeline

model_name = "IlyaGusev/rubertconv_toxic_clf"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name, framework="pt") 

text = "Ты придурок из интернета"
pipe([text])

Training data

Datasets:

Augmentations:

  • ё -> е
  • Remove or add "?" or "!"
  • Fix CAPS
  • Concatenate toxic and non-toxic texts
  • Concatenate two non-toxic texts
  • Add toxic words from vocabulary
  • Add typos
  • Mask toxic words with "*", "@", "$"

Training procedure

TBA

Downloads last month
1,490
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using IlyaGusev/rubertconv_toxic_clf 2