Magicearth
/

finetuned_roberta

Text Classification

Model card Files Files and versions Community

finetuned_roberta / README.md

Magicearth's picture

Update README.md

962a266 verified 3 months ago

|

history blame contribute delete

1.41 kB

	---
	language: en
	tags:
	- text-classification
	- roberta
	- custom
	datasets:
	- google/jigsaw_toxicity_pred
	base_model:
	- FacebookAI/roberta-base
	pipeline_tag: text-classification
	---

	# Modèle finetuné de RoBERTa-base pour la détection de toxicité dans un texte

	Le modèle a pour objectif de détecter la toxicité dans un texte en prédisant la probabilité d'appartenir à ces catégories attribuant un score pour chacune de ces catégories.
	Catégories: toxic, severe_toxic, obscene, threat, insult, identity_hate

	Le finetuning a été fait pour 4 époques. La dataset utilisé est celui de Google appelé jigsaw_toxicity_pred.

	# Paramètres d'entraînement
	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	save_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=16,
	per_device_eval_batch_size=16,
	num_train_epochs=5,
	weight_decay=0.01,
	save_total_limit=5,
	logging_dir="./logs",
	logging_steps=10,
	load_best_model_at_end=True,
	)

	# Erreur moyenne absolue par catégorie sur le dataset d'entraînement:

	toxic: 0.0271
	severe_toxic: 0.0128
	obscene: 0.0185
	threat: 0.0029
	insult: 0.0250
	identity_hate: 0.0081