Text Classification
Transformers
Safetensors
distilbert
CortexPE commited on
Commit
47405d2
·
verified ·
1 Parent(s): cd6a6e6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # DistilBERT Incoherence Classifier
3
+
4
+ This is a fine-tuned DistilBERT model for classifying text based on its coherence. It can identify various types of incoherence.
5
+
6
+ ## Model Details
7
+
8
+ - **Model:** DistilBERT (distilbert-base-multilingual-cased)
9
+ - **Task:** Text Classification (Coherence Detection)
10
+ - **Fine-tuning:** The model was fine-tuned using a custom-generated dataset that features various types of incoherence.
11
+ - **Training Dataset** The model was trained on the [incoherent-text-dataset](https://huggingface.co/datasets/your_huggingface_username/incoherent-text-dataset) dataset, located on Huggingface.
12
+
13
+ ## Training Metrics
14
+
15
+ | Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
16
+ | :---- | :------------ | :-------------- | :------- | :-------- | :----- | :------- |
17
+ | 1 | 0.037500 | 0.071958 | 0.984995 | 0.985002 | 0.984995 | 0.984564 |
18
+ | 2 | 0.008900 | 0.068670 | 0.985995 | 0.985973 | 0.985995 | 0.985603 |
19
+ | 3 | 0.008500 | 0.058111 | 0.990330 | 0.990260 | 0.990330 | 0.990262 |
20
+
21
+ ## Evaluation Metrics
22
+
23
+ The following metrics were measured on the test set:
24
+
25
+ | Metric | Value |
26
+ | :---------- | :------- |
27
+ | Loss | 0.049511 |
28
+ | Accuracy | 0.991 |
29
+ | Precision | 0.990958 |
30
+ | Recall | 0.991 |
31
+ | F1-Score | 0.990962 |
32
+
33
+ ## Classification Report:
34
+
35
+ ```
36
+ precision recall f1-score support
37
+
38
+ coherent 0.99 0.99 0.99 1500
39
+ grammatical_errors 0.96 0.94 0.95 250
40
+ random_bytes 1.00 1.00 1.00 250
41
+ random_tokens 1.00 1.00 1.00 250
42
+ random_words 1.00 1.00 1.00 250
43
+ run_on 1.00 0.99 1.00 250
44
+ word_soup 1.00 1.00 1.00 250
45
+
46
+ accuracy 0.99 3000
47
+ macro avg 0.99 0.99 0.99 3000
48
+ weighted avg 0.99 0.99 0.99 3000
49
+ ```
50
+
51
+ ## Confusion Matrix
52
+
53
+ ![Confusion Matrix](confusion_matrix.png)
54
+
55
+ The confusion matrix above shows the performance of the model on each class.
56
+
57
+ ## Usage
58
+
59
+ This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.
60
+
61
+ ## Limitations
62
+
63
+ The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.
64
+
65
+ ## License
66
+
67
+ CC-BY-SA 4.0