rsepulvedat commited on
Commit
a4717a8
ยท
verified ยท
1 Parent(s): 1cbc1b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - gplsi/SocialTOX
5
+ language:
6
+ - es
7
+ metrics:
8
+ - f1
9
+ - precision
10
+ - recall
11
+ - confusion_matrix
12
+ base_model:
13
+ - meta-llama/Llama-3.1-8B-Instruct
14
+ library_name: transformers
15
+ ---
16
+
17
+ # ๐Ÿง  Toxicity_model_Llama_3.1_8B_Binary โ€“ Spanish Toxicity Classifier (Instruction-Tuned)
18
+
19
+ ## ๐Ÿ“Œ Model Description
20
+
21
+ This model is an **instruction-tuned version** of `LLaMA 3.1 8B Instruct`, specifically trained to classify the toxicity level of **Spanish-language user comments on news articles**. It distinguishes between tow categories:
22
+
23
+ - **Non-toxic**
24
+ - **Toxic**
25
+
26
+ The model follows instruction-based prompts and returns a single classification label in response.
27
+
28
+ ---
29
+
30
+ ## ๐Ÿ“‚ Training Data
31
+
32
+ The model was fine-tuned on the **[SocialTOX dataset](https://huggingface.co/datasets/gplsi/SocialTOX)**, a collection of Spanish-language comments annotated for varying levels of toxicity. These comments come from news platforms and represent real-world scenarios of online discourse. In this case, a Binary classifier was develop, where the classes \textit{Slightly toxic} and \textit{Toxic} were merged into a single \textit{Toxic} category.
33
+
34
+ ---
35
+
36
+ ## ๐Ÿ’ฌ Instruction Format
37
+
38
+ The model expects input in a structured conversational format like the one below:
39
+
40
+ ```json
41
+ [
42
+ {
43
+ "role": "system",
44
+ "content": "You are an expert in detecting toxicity in comments, and your goal is to classify comments based on their level of toxicity. The comments were made on news articles. The toxicity categories are:
45
+ Toxic: Comments that contain derogatory or pejorative language, inappropriate jokes, fearmongering, denial of facts, threats, personal attacks, insults, degradation, or racist or sexist language. Only classify a comment as โ€œtoxicโ€ if it contains clear attack language, direct insults, or demeaning references.
46
+ Non-toxic: Neutral or critical comments that do not include Toxic or Slightly toxic elements. Note that negative or critical comments (those with a serious or discontented tone) are Not toxic or Slightly toxic unless they meet the criteria of the categories above.
47
+ Please write the corresponding category immediately after the word 'answer.' In case of doubt between two labels, choose the one with the lowest or no toxicity level."
48
+ },
49
+ {
50
+ "role": "user",
51
+ "content": "Text: "Narco-Bolivarian Communism"
52
+ },
53
+ {
54
+ "role": "assistant",
55
+ "content": "Toxic"
56
+ }
57
+ ]
58
+ ```
59
+
60
+ ## Training hyperparameters
61
+ - epochs: 3
62
+ - learning_rate: 1e-5
63
+ - beta1: 0.9
64
+ - beta2: 0.95
65
+ - weight_decay: 0.1
66
+ - batch_size global: 4
67
+ - micro_batch_size: 1
68
+ - lr_warmup_steps: 100
69
+ - max_seq_length: 512
70
+
71
+ ## ๐Ÿ“Š Evaluation
72
+
73
+ The model was evaluated on a held-out **test set of 968 manually annotated comments**. Below are the confusion matrix and classification metrics:
74
+
75
+ ### ๐Ÿงฎ Confusion Matrix (Binary Classification)
76
+
77
+ | | Non-toxic | Toxic |
78
+ |--------------------------|----------------------|------------------|
79
+ | **Non-toxic** | 534 | 53 |
80
+ | **Toxic** | 136 | 245 |
81
+
82
+ ---
83
+
84
+ ### ๐Ÿ“ˆ Classification Report
85
+
86
+ | Class | Precision | Recall | F1-score | Support |
87
+ |--------------|-----------|---------|----------|---------|
88
+ | **Non-toxic**| 0.8222 | 0.6430 | 0.7216 | 587 |
89
+ | **Toxic** | 0.7970 | 0.9097 | 0.8496 | 381 |
90
+ | | | | | |
91
+ | **Accuracy** | | | **0.8048** | **968** |
92
+ | **Macro avg**| 0.8096 | 0.7764 | 0.7856 | 968 |
93
+ | **Weighted avg** | 0.8069 | 0.8048 | 0.7993 | 968 |
94
+
95
+ **Macro F1-score**: `0.7856`