Hate Speech detection in English

bertweet-hate-speech

Repository: https://github.com/pysentimiento/pysentimiento/

Model trained with SemEval 2019 Task 5: HatEval (SubTask B) corpus for Hate Speech detection in English. Base model is BERTweet, a RoBERTa model trained in English tweets.

It is a multi-classifier model, with the following classes:

  • HS: is it hate speech?
  • TR: is it targeted to a specific individual?
  • AG: is it aggressive?

License

pysentimiento is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses.

  1. TASS Dataset license
  2. SEMEval 2017 Dataset license

Citation

If you use this model in your work, please cite the following papers:

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@inproceedings{nguyen2020bertweet,
  title={BERTweet: A pre-trained language model for English Tweets},
  author={Nguyen, Dat Quoc and Vu, Thanh and Nguyen, Anh Tuan},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={9--14},
  year={2020}
}

@inproceedings{basile2019semeval,
  title={Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter},
  author={Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Pardo, Francisco Manuel Rangel and Rosso, Paolo and Sanguinetti, Manuela},
  booktitle={Proceedings of the 13th international workshop on semantic evaluation},
  pages={54--63},
  year={2019}
}

Enjoy! 🤗

Downloads last month
9,883
Inference API
Unable to determine this model’s pipeline type. Check the docs .