|
--- |
|
pipeline_tag: text-classification |
|
language: multilingual |
|
license: apache-2.0 |
|
tags: |
|
- "sentiment-analysis" |
|
- "multilingual" |
|
widget: |
|
- text: "I am very happy." |
|
example_title: "English" |
|
- text: "Heute bin ich schlecht drauf." |
|
example_title: "Deutsch" |
|
- text: "Quel cauchemard!" |
|
example_title: "Francais" |
|
- text: "ฉันรักฤดูใบไม้ผลิ" |
|
example_title: "ภาษาไทย" |
|
--- |
|
|
|
# Multi-lingual sentiment prediction trained from COVID19-related tweets |
|
|
|
Repository: [https://github.com/clampert/multilingual-sentiment-analysis/](https://github.com/clampert/multilingual-sentiment-analysis/) |
|
|
|
Model trained on a large-scale (18437530 examples) dataset of |
|
multi-lingual tweets that was collected between March 2020 |
|
and November 2021 using Twitter’s Streaming API with varying |
|
COVID19-related keywords. Labels were auto-general based on |
|
the presence of positive and negative emoticons. For details |
|
on the dataset, see our IEEE BigData 2021 publication. |
|
|
|
Base model is [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual). |
|
It was finetuned for sequence classification with `positive` |
|
and `negative` labels for two epochs (48 hours on 8xP100 GPUs). |
|
|
|
## Citation |
|
|
|
If you use our model your work, please cite: |
|
|
|
``` |
|
@inproceedings{lampert2021overcoming, |
|
title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis}, |
|
author={Jasmin Lampert and Christoph H. Lampert}, |
|
booktitle={IEEE International Conference on Big Data (BigData)}, |
|
year={2021}, |
|
note={Special Session: Machine Learning on Big Data}, |
|
} |
|
``` |
|
|
|
Enjoy! |
|
|