File size: 2,221 Bytes
47e4fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc37c6b
47e4fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4370d52
47e4fc9
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: apache-2.0
metrics:
- accuracy
- f1
pipeline_tag: text-classification
tags:
- stereotype
language:
- it
---
# Stereotype detection at aequa-tech

## Model Description

- **Developed by:** [aequa-tech](https://aequa-tech.com/)
- **Funded by:** [NGI-Search](https://www.ngi.eu/ngi-projects/ngi-search/)
- **Language(s) (NLP):** Italian
- **License:** apache-2.0
- **Finetuned from model:** [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto)

This model is a fine-tuned version of [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) Italian model on **stereotypes detection**

# Training Details

## Training Data

- [HaSpeeDe 2020](https://live.european-language-grid.eu/catalogue/corpus/7498)
- [Sarcastic Hate Speech dataset](https://github.com/simonasnow/Sarcastic-Hate-Speech)
- Racial stereotypes corpus available upon request to the authors of [A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads](https://aclanthology.org/2023.findings-eacl.51.pdf)
- [Debunker-Assistant corpus](https://github.com/AequaTech/DebunkerAssistant/tree/main/evaluation/training_datasets)

## Training Hyperparameters

- learning_rate: 2e-5
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam


# Evaluation

## Testing Data
It was tested on HaSpeeDe test sets (tweets and news headlines) obtaining the following results:

## Metrics and Results
Tweets:
- macro F1: 0.75
- accuracy: 0.75
- precision of positive class: 0.66
- recall of positive class: 0.94
- F1 of positive class: 0.78

News Headlines:
- macro F1: 0.72
- accuracy: 0.77
- precision of positive class: 0.73
- recall of positive class: 0.52
- F1 of positive class: 0.61

# Framework versions

- Transformers 4.30.2
- Pytorch 2.1.2
- Datasets 2.19.0
- Accelerate 0.30.0

# How to use this model:
```Python
model = AutoModelForSequenceClassification.from_pretrained('aequa-tech/stereotype-it',num_labels=2) 
tokenizer = AutoTokenizer.from_pretrained("m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0") 
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
classifier("text")
```