|
--- |
|
datasets: |
|
- jigsaw_unintended_bias |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Name |
|
|
|
Toxicity Classifier with Debiaser |
|
|
|
## Model description |
|
|
|
This model is a text classification model trained on a large dataset of comments to predict whether a given comment contains biased language or not. |
|
The model is based on DistilBERT architecture and fine-tuned on a labeled dataset of toxic and non-toxic comments. |
|
|
|
## Intended Use |
|
|
|
This model is intended to be used to automatically detect biased language in user-generated comments in various online platforms. |
|
It can also be used as a component in a larger pipeline for text classification, sentiment analysis, or bias detection tasks. |
|
|
|
````` |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("shainaraza/toxity_classify_debiaser") |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("shainaraza/toxity_classify_debiaser") |
|
|
|
# Test the model with a sample comment |
|
comment = "you are a dumb person." |
|
inputs = tokenizer(comment, return_tensors="pt") |
|
outputs = model(**inputs) |
|
prediction = torch.argmax(outputs.logits, dim=1).item() |
|
|
|
print(f"Comment: {comment}") |
|
print(f"Prediction: {'biased' if prediction == 1 else 'not biased'}") |
|
|
|
````` |
|
|
|
## Training data |
|
|
|
The model was trained on a labeled dataset of comments from various online platforms, which were annotated as toxic or non-toxic by human annotators. |
|
|
|
## Evaluation results |
|
|
|
The model was evaluated on a separate test set of comments and achieved the following performance metrics: |
|
|
|
- Accuracy: 0.95 |
|
- F1-score: 0.94 |
|
- ROC-AUC: 0.97 |
|
|
|
## Limitations and bias |
|
|
|
This model has been trained and tested on comments from various online platforms, but its performance may be limited when applied to comments from different domains or languages. |
|
|
|
## Conclusion |
|
|
|
The Toxicity Classifier is a powerful tool for automatically detecting and flagging potentially biased language in user-generated comments. While there are some limitations to its performance and potential biases in the training data, the model's high accuracy and robustness make it a valuable asset for any online platform looking to improve the quality and inclusivity of its user-generated content. |