emBERT / README.md
poltextlab's picture
Update metadata with huggingface_hub
955d29d verified
---
license: apache-2.0
extra_gated_fields:
Name: text
Country: country
Institution: text
Institution Email: text
Please specify your academic use case: text
extra_gated_prompt: Our models are intended for academic use only. If you are not
affiliated with an academic institution, please provide a rationale for using our
models. Please allow us a few business days to manually review subscriptions.
---
[README UNDER CONSTRUCTION]
emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) tokenizer, and was fine-tuned on a [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually.
The results of the fine-tuning validation were:
| emotion | precision | recall | f1-score |
| ------- | --------- | ------ | -------- |
|0 - Anger| 0.70 | 0.74 | 0.72 |
|1 - Disgust| 0.72 | 0.73 | 0.73|
|2 - Fear|0.61 | 0.47 | 0.53|
|3 - Happiness|0.38 | 0.37 | 0.38|
|4 - Neutral|0.65 | 0.62 | 0.63|
|5 - Sad|0.74 | 0.72 | 0.73|
|6 - Successful|0.79 |0.81 | 0.80|
|7 - Trustful|0.76 |0.78 |0.77|
|weighted avg|0.73 | 0.74 | 0.73|
Accuracy reached 74%.
The emotions are based on [Plutchik 1980](https://doi.org/10.1016/B978-0-12-558701-3.50007-7), with anticipation substituted with neutral.
Proper use of the model:
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")
```
The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.