--- license: apache-2.0 metrics: - accuracy - f1 pipeline_tag: text-classification tags: - stereotype language: - it --- # Stereotype detection at aequa-tech ## Model Description - **Developed by:** [aequa-tech](https://aequa-tech.com/) - **Funded by:** [NGI-Search](https://www.ngi.eu/ngi-projects/ngi-search/) - **Language(s) (NLP):** Italian - **License:** apache-2.0 - **Finetuned from model:** [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) This model is a fine-tuned version of [AlBERTo](https://huggingface.co/m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alberto) Italian model on **stereotypes detection** # Training Details ## Training Data - [HaSpeeDe 2020](https://live.european-language-grid.eu/catalogue/corpus/7498) - [Sarcastic Hate Speech dataset](https://github.com/simonasnow/Sarcastic-Hate-Speech) - Racial stereotypes corpus available upon request to the authors of [A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads](https://aclanthology.org/2023.findings-eacl.51.pdf) - [Debunker-Assistant corpus](https://github.com/AequaTech/DebunkerAssistant/tree/main/evaluation/training_datasets) ## Training Hyperparameters - learning_rate: 2e-5 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam # Evaluation ## Testing Data It was tested on HaSpeeDe test sets (tweets and news headlines) obtaining the following results: ## Metrics and Results Tweets: - macro F1: 0.75 - accuracy: 0.75 - precision of positive class: 0.66 - recall of positive class: 0.94 - F1 of positive class: 0.78 News Headlines: - macro F1: 0.72 - accuracy: 0.77 - precision of positive class: 0.73 - recall of positive class: 0.52 - F1 of positive class: 0.61 # Framework versions - Transformers 4.30.2 - Pytorch 2.1.2 - Datasets 2.19.0 - Accelerate 0.30.0 # How to use this model: ``` model = AutoModelForSequenceClassification.from_pretrained('aequa-tech/stereotype-it',num_labels=2) tokenizer = AutoTokenizer.from_pretrained("m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0") classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) classifier("text") ```