sfrenda commited on
Commit
47e4fc9
·
verified ·
1 Parent(s): d8e1e3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -3
README.md CHANGED
@@ -1,3 +1,75 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ metrics:
4
+ - accuracy
5
+ - f1
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - stereotype
9
+ language:
10
+ - it
11
+ ---
12
+ # Stereotype detection at aequa-tech
13
+
14
+ ## Model Description
15
+
16
+ - **Developed by:** [aequa-tech](https://aequa-tech.com/)
17
+ - **Funded by:** [NGI-Search](https://www.ngi.eu/ngi-projects/ngi-search/)
18
+ - **Language(s) (NLP):** Italian
19
+ - **License:** apache-2.0
20
+ - **Finetuned from model:** [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto)
21
+
22
+ This model is a fine-tuned version of [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) Italian model on **stereotypes detection**:
23
+
24
+ # Training Details
25
+
26
+ ## Training Data
27
+
28
+ - [HaSpeeDe 2020](https://live.european-language-grid.eu/catalogue/corpus/7498)
29
+ - [Sarcastic Hate Speech dataset](https://github.com/simonasnow/Sarcastic-Hate-Speech)
30
+ - Racial stereotypes corpus available upon request to the authors of [A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads](https://aclanthology.org/2023.findings-eacl.51.pdf)
31
+ - [Debunker-Assistant corpus](https://github.com/AequaTech/DebunkerAssistant/tree/main/evaluation/training_datasets)
32
+
33
+ ## Training Hyperparameters
34
+
35
+ - learning_rate: 2e-5
36
+ - train_batch_size: 16
37
+ - eval_batch_size: 16
38
+ - seed: 42
39
+ - optimizer: Adam
40
+
41
+
42
+ # Evaluation
43
+
44
+ ## Testing Data
45
+ It was tested on HaSpeeDe test sets (tweets and news headlines) obtaining the following results:
46
+
47
+ ## Metrics and Results
48
+ Tweets:
49
+ - macro F1: 0.75
50
+ - accuracy: 0.75
51
+ - precision of positive class: 0.66
52
+ - recall of positive class: 0.94
53
+ - F1 of positive class: 0.78
54
+
55
+ News Headlines:
56
+ - macro F1: 0.72
57
+ - accuracy: 0.77
58
+ - precision of positive class: 0.73
59
+ - recall of positive class: 0.52
60
+ - F1 of positive class: 0.61
61
+
62
+ # Framework versions
63
+
64
+ - Transformers 4.30.2
65
+ - Pytorch 2.1.2
66
+ - Datasets 2.19.0
67
+ - Accelerate 0.30.0
68
+
69
+ # How to use this model:
70
+ ```
71
+ model = AutoModelForSequenceClassification.from_pretrained('aequa-tech/stereotype-it',num_labels=2)
72
+ tokenizer = AutoTokenizer.from_pretrained("m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0")
73
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
74
+ classifier("text")
75
+ ```