poltextlab
/

emBERT

Text Classification

Inference Endpoints

Model card Files Files and versions Community

emBERT / README.md

poltextlab's picture

Update metadata with huggingface_hub

955d29d verified 16 days ago

|

history blame contribute delete

1.99 kB

	---
	license: apache-2.0
	extra_gated_fields:
	Name: text
	Country: country
	Institution: text
	Institution Email: text
	Please specify your academic use case: text
	extra_gated_prompt: Our models are intended for academic use only. If you are not
	affiliated with an academic institution, please provide a rationale for using our
	models. Please allow us a few business days to manually review subscriptions.
	---

	[README UNDER CONSTRUCTION]

	emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) tokenizer, and was fine-tuned on a [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually.
	The results of the fine-tuning validation were:

	\| emotion \| precision \| recall \| f1-score \|
	\| ------- \| --------- \| ------ \| -------- \|
	\|0 - Anger\| 0.70 \| 0.74 \| 0.72 \|
	\|1 - Disgust\| 0.72 \| 0.73 \| 0.73\|
	\|2 - Fear\|0.61 \| 0.47 \| 0.53\|
	\|3 - Happiness\|0.38 \| 0.37 \| 0.38\|
	\|4 - Neutral\|0.65 \| 0.62 \| 0.63\|
	\|5 - Sad\|0.74 \| 0.72 \| 0.73\|
	\|6 - Successful\|0.79 \|0.81 \| 0.80\|
	\|7 - Trustful\|0.76 \|0.78 \|0.77\|
	\|weighted avg\|0.73 \| 0.74 \| 0.73\|
	Accuracy reached 74%.

	The emotions are based on [Plutchik 1980](https://doi.org/10.1016/B978-0-12-558701-3.50007-7), with anticipation substituted with neutral.

	Proper use of the model:

	```
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

	model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")
	```

	The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.