--- pipeline_type: "text-classification" widget: - text: "this is a lovely message" example_title: "Example 1" multi_class: false - text: "you are an idiot and you and your family should go back to your country" example_title: "Example 2" multi_class: false language: - en - nl - fr - pt - it - es - de - da - pl - af datasets: - jigsaw_toxicity_pred metrics: - F1 Accuracy --- # citizenlab/distilbert-base-multilingual-cased-toxicity This is multilingual Distil-Bert model sequence classifier trained based on [JIGSAW Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) dataset. ## How to use it ```python from transformers import pipeline model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity" toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path) toxicity_classifier("this is a lovely message") > [{'label': 'not_toxic', 'score': 0.9954179525375366}] toxicity_classifier("you are an idiot and you and your family should go back to your country") > [{'label': 'toxic', 'score': 0.9948776960372925}] ``` ## Evaluation ### Accuracy ``` Accuracy Score = 0.9425 F1 Score (Micro) = 0.9450549450549449 F1 Score (Macro) = 0.8491432341169309 ```