Text2Text Generation
Transformers
Safetensors
mt5
Inference Endpoints

Model Card for Model ID

Finetune of the mt0-xl model for text toxification task.

Model Details

Model Description

This is a finetune of mt0-xl model for text toxification task. Can be used for synthetic data generation from non-toxic examples.

  • Developed by: Nikita Sushko
  • Model type: mt5-xl
  • Language(s) (NLP): English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi
  • License: OpenRail++
  • Finetuned from model: mt0-xl

Uses

This model is intended to be used for synthetic data generation from non-toxic examples.

Direct Use

The model may be directly used for text toxification tasks.

Out-of-Scope Use

The model may be used for generating toxic versions of sentences.

Bias, Risks, and Limitations

Since this model generates toxic versions of sentences, it may be used to increase toxicity of generated texts.

How to Get Started with the Model

Use the code below to get started with the model.

import transformers

checkpoint = 'chameleon-lizard/tox-mt0-xl'

tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto")

pipe = transformers.pipeline(
    "text2text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=512,
    truncation=True,
)

language = 'English'
text = "That's dissapointing."
print(pipe('Rewrite the following text in {language} the most toxic and obscene version possible: {text}')[0]['generated_text'])
# Resulting text: "That's dissapointing, you stupid ass bitch."

Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language.

Downloads last month
12
Safetensors
Model size
3.74B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train chameleon-lizard/tox-mt0-xl