--- license: apache-2.0 extra_gated_fields: Name: text Country: country Institution: text Institution Email: text Please specify your academic use case: text extra_gated_prompt: Our models are intended for academic use only. If you are not affiliated with an academic institution, please provide a rationale for using our models. Please allow us a few business days to manually review subscriptions. --- [README UNDER CONSTRUCTION] emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) tokenizer, and was fine-tuned on a [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were: | emotion | precision | recall | f1-score | | ------- | --------- | ------ | -------- | |0 - Anger| 0.70 | 0.74 | 0.72 | |1 - Disgust| 0.72 | 0.73 | 0.73| |2 - Fear|0.61 | 0.47 | 0.53| |3 - Happiness|0.38 | 0.37 | 0.38| |4 - Neutral|0.65 | 0.62 | 0.63| |5 - Sad|0.74 | 0.72 | 0.73| |6 - Successful|0.79 |0.81 | 0.80| |7 - Trustful|0.76 |0.78 |0.77| |weighted avg|0.73 | 0.74 | 0.73| Accuracy reached 74%. The emotions are based on [Plutchik 1980](https://doi.org/10.1016/B978-0-12-558701-3.50007-7), with anticipation substituted with neutral. Proper use of the model: ``` from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc") model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT") ``` The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.