Text Classification
Transformers
Safetensors
distilbert
CortexPE commited on
Commit
3df9967
·
verified ·
1 Parent(s): 4ad32b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -32
README.md CHANGED
@@ -1,22 +1,42 @@
1
-
2
- # DistilBERT Incoherence Classifier
3
-
4
- This is a fine-tuned DistilBERT model for classifying text based on its coherence. It can identify various types of incoherence.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
  ## Model Details
7
 
8
- - **Model:** DistilBERT (distilbert-base-multilingual-cased)
9
- - **Task:** Text Classification (Coherence Detection)
10
- - **Fine-tuning:** The model was fine-tuned using a custom-generated dataset that features various types of incoherence.
11
- - **Training Dataset** The model was trained on the [incoherent-text-dataset](https://huggingface.co/datasets/your_huggingface_username/incoherent-text-dataset) dataset, located on Huggingface.
12
 
13
  ## Training Metrics
14
 
15
  | Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
16
- | :---- | :------------ | :-------------- | :------- | :-------- | :----- | :------- |
17
- | 1 | 0.037500 | 0.071958 | 0.984995 | 0.985002 | 0.984995 | 0.984564 |
18
- | 2 | 0.008900 | 0.068670 | 0.985995 | 0.985973 | 0.985995 | 0.985603 |
19
- | 3 | 0.008500 | 0.058111 | 0.990330 | 0.990260 | 0.990330 | 0.990262 |
20
 
21
  ## Evaluation Metrics
22
 
@@ -24,28 +44,28 @@ The following metrics were measured on the test set:
24
 
25
  | Metric | Value |
26
  | :---------- | :------- |
27
- | Loss | 0.049511 |
28
- | Accuracy | 0.991 |
29
- | Precision | 0.990958 |
30
- | Recall | 0.991 |
31
- | F1-Score | 0.990962 |
32
 
33
  ## Classification Report:
34
 
35
  ```
36
  precision recall f1-score support
37
 
38
- coherent 0.99 0.99 0.99 1500
39
- grammatical_errors 0.96 0.94 0.95 250
40
- random_bytes 1.00 1.00 1.00 250
41
- random_tokens 1.00 1.00 1.00 250
42
- random_words 1.00 1.00 1.00 250
43
- run_on 1.00 0.99 1.00 250
44
- word_soup 1.00 1.00 1.00 250
45
-
46
- accuracy 0.99 3000
47
- macro avg 0.99 0.99 0.99 3000
48
- weighted avg 0.99 0.99 0.99 3000
49
  ```
50
 
51
  ## Confusion Matrix
@@ -58,10 +78,28 @@ The confusion matrix above shows the performance of the model on each class.
58
 
59
  This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.
60
 
61
- ## Limitations
 
 
 
 
 
 
62
 
63
- The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.
64
 
65
- ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- CC-BY-SA 4.0
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ datasets:
4
+ - SuccubusBot/incoherent-text-dataset
5
+ language:
6
+ - en
7
+ - es
8
+ - fr
9
+ - de
10
+ - zh
11
+ - ja
12
+ - ru
13
+ - ar
14
+ - hi
15
+ metrics:
16
+ - accuracy
17
+ base_model:
18
+ - distilbert/distilbert-base-multilingual-cased
19
+ pipeline_tag: text-classification
20
+ library_name: transformers
21
+ ---
22
+
23
+ # DistilBERT Incoherence Classifier (Multilingual)
24
+
25
+ This is a fine-tuned DistilBERT-multilingual model for classifying text based on its coherence. It can identify various types of incoherence.
26
 
27
  ## Model Details
28
 
29
+ - **Model:** DistilBERT (distilbert-base-multilingual-cased)
30
+ - **Task:** Text Classification (Coherence Detection)
31
+ - **Fine-tuning:** The model was fine-tuned using a synthetically generated dataset that features various types of incoherence
 
32
 
33
  ## Training Metrics
34
 
35
  | Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
36
+ | :---- | :------------ | :------------ | :-------- | :-------- | :-------- | :------- |
37
+ | 1 | 0.343600 | 0.303963 | 0.880312 | 0.882746 | 0.880312 | 0.879637 |
38
+ | 2 | 0.245200 | 0.286482 | 0.900850 | 0.901156 | 0.900850 | 0.899612 |
39
+ | 3 | 0.149700 | 0.313061 | 0.906161 | 0.906049 | 0.906161 | 0.905103 |
40
 
41
  ## Evaluation Metrics
42
 
 
44
 
45
  | Metric | Value |
46
  | :---------- | :------- |
47
+ | Loss | 0.316272 |
48
+ | Accuracy | 0.903329 |
49
+ | Precision | 0.903704 |
50
+ | Recall | 0.903329 |
51
+ | F1-Score | 0.902359 |
52
 
53
  ## Classification Report:
54
 
55
  ```
56
  precision recall f1-score support
57
 
58
+ coherent 0.86 0.93 0.90 2051
59
+ grammatical_errors 0.88 0.76 0.81 599
60
+ random_bytes 1.00 1.00 1.00 599
61
+ random_tokens 1.00 1.00 1.00 600
62
+ random_words 0.95 0.93 0.94 600
63
+ run_on 0.85 0.79 0.82 600
64
+ word_soup 0.89 0.83 0.86 599
65
+
66
+ accuracy 0.90 5648
67
+ macro avg 0.92 0.89 0.90 5648
68
+ weighted avg 0.90 0.90 0.90 5648
69
  ```
70
 
71
  ## Confusion Matrix
 
78
 
79
  This model can be used for text classification tasks, specifically for detecting and categorizing different types of text incoherence. You can use the `inference_example` function provided in the notebook to test your own text.
80
 
81
+ ```py
82
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
83
+
84
+ tokenizer = AutoTokenizer.from_pretrained("SuccubusBot/distilbert-multilingual-incoherence-classifier")
85
+ model = AutoModelForSequenceClassification.from_pretrained("SuccubusBot/distilbert-multilingual-incoherence-classifier")
86
+
87
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
88
 
 
89
 
90
+ while True:
91
+ text = input("Enter text (or type 'exit' to quit): ")
92
+ if text.lower() == "exit":
93
+ break
94
+
95
+ # Example usage
96
+ results = classifier(text)
97
+
98
+ # Print the results with confidence scores for all labels
99
+ for result in results:
100
+ print(f"Label: {result['label']}, Confidence: {result['score']}")
101
+ ```
102
+
103
+ ## Limitations
104
 
105
+ The model has been trained on a generated dataset, so care must be taken in evaluating it in the real world. More data may need to be collected before evaluating this model in a real-world setting.