SuccubusBot
/

distilbert-multilingual-incoherence-classifier

Text Classification

Model card Files Files and versions Community

distilbert-multilingual-incoherence-classifier / README.md

CortexPE's picture

Update README.md

3df9967 verified 5 months ago

|

history blame contribute delete

3.46 kB

	---
	license: cc-by-sa-4.0
	datasets:
	- SuccubusBot/incoherent-text-dataset
	language:
	- en
	- es
	- fr
	- de
	- zh
	- ja
	- ru
	- ar
	- hi
	metrics:
	- accuracy
	base_model:
	- distilbert/distilbert-base-multilingual-cased
	pipeline_tag: text-classification
	library_name: transformers
	---

	# DistilBERT Incoherence Classifier (Multilingual)

	This is a fine-tuned DistilBERT-multilingual model for classifying text based on its coherence. It can identify various types of incoherence.

	## Model Details

	- Model: DistilBERT (distilbert-base-multilingual-cased)
	- Task: Text Classification (Coherence Detection)
	- Fine-tuning: The model was fine-tuned using a synthetically generated dataset that features various types of incoherence

	## Training Metrics

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \| Precision \| Recall \| F1 \|
	\| :---- \| :------------ \| :------------ \| :-------- \| :-------- \| :-------- \| :------- \|
	\| 1 \| 0.343600 \| 0.303963 \| 0.880312 \| 0.882746 \| 0.880312 \| 0.879637 \|
	\| 2 \| 0.245200 \| 0.286482 \| 0.900850 \| 0.901156 \| 0.900850 \| 0.899612 \|
	\| 3 \| 0.149700 \| 0.313061 \| 0.906161 \| 0.906049 \| 0.906161 \| 0.905103 \|

	## Evaluation Metrics

	The following metrics were measured on the test set:

	\| Metric \| Value \|
	\| :---------- \| :------- \|
	\| Loss \| 0.316272 \|
	\| Accuracy \| 0.903329 \|
	\| Precision \| 0.903704 \|
	\| Recall \| 0.903329 \|
	\| F1-Score \| 0.902359 \|

	## Classification Report:

	```
	precision recall f1-score support

	coherent 0.86 0.93 0.90 2051
	grammatical_errors 0.88 0.76 0.81 599
	random_bytes 1.00 1.00 1.00 599
	random_tokens 1.00 1.00 1.00 600
	random_words 0.95 0.93 0.94 600
	run_on 0.85 0.79 0.82 600
	word_soup 0.89 0.83 0.86 599

	accuracy 0.90 5648
	macro avg 0.92 0.89 0.90 5648
	weighted avg 0.90 0.90 0.90 5648
	```

	## Confusion Matrix

	![Confusion Matrix](confusion_matrix.png)

	The confusion matrix above shows the performance of the model on each class.

	## Usage

	This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.

	```py
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

	tokenizer = AutoTokenizer.from_pretrained("SuccubusBot/distilbert-multilingual-incoherence-classifier")
	model = AutoModelForSequenceClassification.from_pretrained("SuccubusBot/distilbert-multilingual-incoherence-classifier")

	classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)


	while True:
	text = input("Enter text (or type 'exit' to quit): ")
	if text.lower() == "exit":
	break

	# Example usage
	results = classifier(text)

	# Print the results with confidence scores for all labels
	for result in results:
	print(f"Label: {result['label']}, Confidence: {result['score']}")
	```

	## Limitations

	The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.