|
--- |
|
license: apache-2.0 |
|
metrics: |
|
- accuracy |
|
- f1 |
|
pipeline_tag: text-classification |
|
tags: |
|
- stereotype |
|
language: |
|
- it |
|
--- |
|
# Stereotype detection at aequa-tech |
|
|
|
## Model Description |
|
|
|
- **Developed by:** [aequa-tech](https://aequa-tech.com/) |
|
- **Funded by:** [NGI-Search](https://www.ngi.eu/ngi-projects/ngi-search/) |
|
- **Language(s) (NLP):** Italian |
|
- **License:** apache-2.0 |
|
- **Finetuned from model:** [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) |
|
|
|
This model is a fine-tuned version of [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) Italian model on **stereotypes detection** |
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
- [HaSpeeDe 2020](https://live.european-language-grid.eu/catalogue/corpus/7498) |
|
- [Sarcastic Hate Speech dataset](https://github.com/simonasnow/Sarcastic-Hate-Speech) |
|
- Racial stereotypes corpus available upon request to the authors of [A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads](https://aclanthology.org/2023.findings-eacl.51.pdf) |
|
- [Debunker-Assistant corpus](https://github.com/AequaTech/DebunkerAssistant/tree/main/evaluation/training_datasets) |
|
|
|
## Training Hyperparameters |
|
|
|
- learning_rate: 2e-5 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- optimizer: Adam |
|
|
|
|
|
# Evaluation |
|
|
|
## Testing Data |
|
It was tested on HaSpeeDe test sets (tweets and news headlines) obtaining the following results: |
|
|
|
## Metrics and Results |
|
Tweets: |
|
- macro F1: 0.75 |
|
- accuracy: 0.75 |
|
- precision of positive class: 0.66 |
|
- recall of positive class: 0.94 |
|
- F1 of positive class: 0.78 |
|
|
|
News Headlines: |
|
- macro F1: 0.72 |
|
- accuracy: 0.77 |
|
- precision of positive class: 0.73 |
|
- recall of positive class: 0.52 |
|
- F1 of positive class: 0.61 |
|
|
|
# Framework versions |
|
|
|
- Transformers 4.30.2 |
|
- Pytorch 2.1.2 |
|
- Datasets 2.19.0 |
|
- Accelerate 0.30.0 |
|
|
|
# How to use this model: |
|
```Python |
|
model = AutoModelForSequenceClassification.from_pretrained('aequa-tech/stereotype-it',num_labels=2) |
|
tokenizer = AutoTokenizer.from_pretrained("m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0") |
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
classifier("text") |
|
``` |