---
license: apache-2.0
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  Institution Email: text
  Please specify your academic use case: text
extra_gated_prompt: Our models are intended for academic use only. If you are not
  affiliated with an academic institution, please provide a rationale for using our
  models. Please allow us a few business days to manually review subscriptions.
---

[README UNDER CONSTRUCTION]

emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) tokenizer, and was fine-tuned on a [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually.
The results of the fine-tuning validation were:

| emotion | precision | recall | f1-score |
| ------- | --------- | ------ | -------- |
|0 - Anger| 0.70   |   0.74   |   0.72 |
|1 - Disgust| 0.72   |   0.73   |   0.73|
|2 - Fear|0.61  |    0.47 |     0.53|
|3 - Happiness|0.38   |   0.37  |    0.38|
|4 - Neutral|0.65    |  0.62   |   0.63|
|5 - Sad|0.74     | 0.72    |  0.73|
|6 - Successful|0.79      |0.81     | 0.80|
|7 - Trustful|0.76      |0.78      |0.77|
|weighted avg|0.73   |   0.74  |    0.73|
Accuracy reached 74%.

The emotions are based on [Plutchik 1980](https://doi.org/10.1016/B978-0-12-558701-3.50007-7), with anticipation substituted with neutral.

Proper use of the model:

```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")
```

The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.