Text Classification
Transformers
Safetensors
English
distilbert
File size: 2,878 Bytes
e8b57f3
 
 
 
 
 
 
 
 
 
 
 
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
 
 
785c9e4
 
 
 
d9866e4
785c9e4
d9866e4
785c9e4
 
 
 
 
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
 
 
 
 
 
 
d9866e4
785c9e4
d9866e4
785c9e4
 
d9866e4
785c9e4
 
 
 
 
 
 
d9866e4
785c9e4
 
 
 
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
785c9e4
d9866e4
e8b57f3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
base_model: distilbert-base-uncased
language:
- en
library_name: transformers
license: cc-by-sa-4.0
pipeline_tag: text-classification
tags:
- text-classification
datasets:
- SuccubusBot/incoherent-text-dataset
---

# DistilBERT Incoherence Classifier

This is a fine-tuned DistilBERT model for classifying text based on its coherence. It can identify various types of incoherence.

## Model Details

-   **Model:** DistilBERT (distilbert-base-uncased)
-   **Task:** Text Classification (Coherence Detection)
-   **Fine-tuning:** The model was fine-tuned using a custom-generated dataset that features various types of incoherence.
- **Training Dataset** The model was trained on the [incoherent-text-dataset](https://huggingface.co/datasets/your_huggingface_username/incoherent-text-dataset) dataset, located on Huggingface.

## Training Metrics

| Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1       |
| :---- | :------------ | :-------------- | :------- | :-------- | :----- | :------- |
| 1     | 0.037500      | 0.071958        | 0.984995 | 0.985002  | 0.984995 | 0.984564 |
| 2     | 0.008900      | 0.068670        | 0.985995 | 0.985973  | 0.985995 | 0.985603 |
| 3     | 0.008500      | 0.058111        | 0.990330 | 0.990260  | 0.990330 | 0.990262 |

## Evaluation Metrics

The following metrics were measured on the test set:

| Metric      | Value    |
| :---------- | :------- |
| Loss        | 0.049511 |
| Accuracy    | 0.991    |
| Precision   | 0.990958 |
| Recall      | 0.991    |
| F1-Score    | 0.990962 |

## Classification Report:

```
                    precision    recall  f1-score   support

          coherent       0.99      0.99      0.99      1500
grammatical_errors       0.96      0.94      0.95       250
      random_bytes       1.00      1.00      1.00       250
     random_tokens       1.00      1.00      1.00       250
      random_words       1.00      1.00      1.00       250
            run_on       1.00      0.99      1.00       250
         word_soup       1.00      1.00      1.00       250

          accuracy                           0.99      3000
         macro avg       0.99      0.99      0.99      3000
      weighted avg       0.99      0.99      0.99      3000
```

## Confusion Matrix

![Confusion Matrix](confusion_matrix.png)

The confusion matrix above shows the performance of the model on each class.

## Usage

This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.

## Limitations

The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.

## License

CC-BY-SA 4.0