|
--- |
|
language: |
|
- 'no' |
|
- nb |
|
- nn |
|
license: cc-by-4.0 |
|
datasets: |
|
- ltg/norec_sentence |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Sentence-level Sentiment Analysis model for Norwegian text |
|
This model is a fine-tuned version of [ltg/norbert3-base](https://huggingface.co/ltg/norbert3-base) for text classification. |
|
|
|
## Training data |
|
The dataset used for fine-tuning is [ltg/norec_sentence](https://huggingface.co/datasets/ltg/norec_sentence), the `mixed` subset with four sentement categories: |
|
``` |
|
[0]: Negative, |
|
[1]: Positive, |
|
[2]: Neutral |
|
[0,1]: Mixed |
|
``` |
|
## Quick start |
|
You can use this model for inference as follows: |
|
|
|
``` |
|
>>> from transformers import pipeline |
|
>>> origin = "ltg/norbert3-base_sentence-sentiment" |
|
>>> pipe = transformers.pipeline( "text-classification", |
|
... model = origin, |
|
... trust_remote_code=origin.startswith("ltg/norbert3"), |
|
... config= origin, |
|
... tokenizer = AutoTokenizer.from_pretrained(origin) |
|
... ) |
|
>>> preds = pipe(["Hans hese, litt såre stemme kler bluesen, men denne platen kommer neppe til å bli blant hans største kommersielle suksesser.", |
|
... "Borten-regjeringen gjorde ikke jobben sin." ]) |
|
>>> for p in preds: |
|
... print(p) |
|
``` |
|
Output: |
|
``` |
|
The model 'NorbertForSequenceClassification' is not supported for text-classification. Supported models are ['AlbertForSequenceClassification', ... |
|
{'label': 'Mixed', 'score': 0.9230353236198425} |
|
{'label': 'Negative', 'score': 0.7348112463951111} |
|
``` |
|
## Training hyperparameters |
|
- per_device_train_batch_size: 16 |
|
- learning_rate: 1e-05 |
|
- gradient_accumulation_steps: 1 |
|
- num_train_epochs: 10 (best epoch 5) |
|
## Evaluation |
|
| Category | F1 | | |
|
|:----------------|---------:|----:| |
|
| Negative_F1 | 0.580247 |<img width=400/> | |
|
| Positive_F1 | 0.781699 | | |
|
| Neutral_F1 | 0.825197 | | |
|
| Mixed_F1 | 0.648649 | | |
|
| Weighted_avg_F1 | 0.763806 | | |
|
|
|
|