shainaraza
/

toxity_classify_debiaser

Text Classification

Inference Endpoints

Model card Files Files and versions Community

toxity_classify_debiaser / README.md

shainaraza's picture

Update README.md

59365ad over 1 year ago

|

2.24 kB

	---
	datasets:
	- jigsaw_unintended_bias
	language:
	- en
	---

	# Model Name

	Toxicity Classifier with Debiaser

	## Model description

	This model is a text classification model trained on a large dataset of comments to predict whether a given comment contains biased language or not.
	The model is based on DistilBERT architecture and fine-tuned on a labeled dataset of toxic and non-toxic comments.

	## Intended Use

	This model is intended to be used to automatically detect biased language in user-generated comments in various online platforms.
	It can also be used as a component in a larger pipeline for text classification, sentiment analysis, or bias detection tasks.

	`````
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("shainaraza/toxity_classify_debiaser")

	model = AutoModelForSequenceClassification.from_pretrained("shainaraza/toxity_classify_debiaser")

	# Test the model with a sample comment
	comment = "you are a dumb person."
	inputs = tokenizer(comment, return_tensors="pt")
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=1).item()

	print(f"Comment: {comment}")
	print(f"Prediction: {'biased' if prediction == 1 else 'not biased'}")

	`````

	## Training data

	The model was trained on a labeled dataset of comments from various online platforms, which were annotated as toxic or non-toxic by human annotators.

	## Evaluation results

	The model was evaluated on a separate test set of comments and achieved the following performance metrics:

	- Accuracy: 0.95
	- F1-score: 0.94
	- ROC-AUC: 0.97

	## Limitations and bias

	This model has been trained and tested on comments from various online platforms, but its performance may be limited when applied to comments from different domains or languages.

	## Conclusion

	The Toxicity Classifier is a powerful tool for automatically detecting and flagging potentially biased language in user-generated comments. While there are some limitations to its performance and potential biases in the training data, the model's high accuracy and robustness make it a valuable asset for any online platform looking to improve the quality and inclusivity of its user-generated content.