metadata
base_model: readerbench/RoBERT-base
language:
- ro
tags:
- sentiment
- classification
- romanian
- nlp
- bert
datasets:
- decathlon_reviews
- cinemagia_reviews
metrics:
- accuracy
- precision
- recall
- f1
- f1 weighted
model-index:
- name: ro-sentiment
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: ro_sent
name: Rommanian Sentiment Dataset
config: default
split: all
metrics:
- type: accuracy
value: 0.85
name: Accuracy
- type: precision
value: 0.85
name: Precision
- type: recall
value: 0.85
name: Recall
- type: f1_weighted
value: 0.85
name: Weighted F1
- type: f1_macro
value: 0.84
name: Macro F1
- task:
type: text-classification
name: Text Classification
dataset:
type: laroseda
name: A Large Romanian Sentiment Data Set
config: default
split: all
metrics:
- type: accuracy
value: 0.85
name: Accuracy
- type: precision
value: 0.86
name: Precision
- type: recall
value: 0.85
name: Recall
- type: f1_weighted
value: 0.84
name: Weighted F1
- type: f1_macro
value: 0.84
name: Macro F1
RO-Sentiment
This model is a fine-tuned version of readerbench/RoBERT-base on the Decathlon reviews and Cinemagia reviews dataset. It achieves the following results on the evaluation set:
- Loss: 0.3923
- Accuracy: 0.8307
- Precision: 0.8366
- Recall: 0.8959
- F1: 0.8652
- F1 Weighted: 0.8287
Output labels:
- LABEL_0 = Negative Sentiment
- LABEL_1 = Positive Sentiment
Evaluation on other datasets
SENT_RO
precision | recall | f1-score | support | |
---|---|---|---|---|
Negative (0) | 0.79 | 0.83 | 0.81 | 11,675 |
Positive (1) | 0.88 | 0.85 | 0.87 | 17,271 |
Accuracy | 0.85 | 28,946 | ||
Macro Avg | 0.84 | 0.84 | 0.84 | 28,946 |
Weighted Avg | 0.85 | 0.85 | 0.85 | 28,946 |
LaRoSeDa
precision | recall | f1-score | support | |
---|---|---|---|---|
Negative (0) | 0.79 | 0.94 | 0.86 | 7,500 |
Positive (1) | 0.93 | 0.75 | 0.83 | 7,500 |
Accuracy | 0.85 | 15,000 | ||
Macro Avg | 0.86 | 0.85 | 0.84 | 15,000 |
Weighted Avg | 0.86 | 0.85 | 0.84 | 15,000 |
Model description
Finetuned Romanian BERT model for sentiment classification.
Trained on a mix of product reviews from Decathlon retailer website and movie reviews from cinemagia.
Intended uses & limitations
Sentiment classification for Romanian Language.
Biased towards Product reviews.
There is no "neutral" sentiment label.
Training and evaluation data
Trained on:
Decathlon Dataset available on request
Cinemagia Movie reviews public on kaggle Link
Evaluated on
- Holdout data from training dataset
- RO_SENT Dataset
- LaROSeDa Dataset
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 6e-05
- train_batch_size: 64
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 10 (Early stop epoch 3, best epoch 2)
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | F1 Weighted |
---|---|---|---|---|---|---|---|---|
0.4198 | 1.0 | 1629 | 0.3983 | 0.8377 | 0.8791 | 0.8721 | 0.8756 | 0.8380 |
0.3861 | 2.0 | 3258 | 0.4312 | 0.8429 | 0.8963 | 0.8665 | 0.8812 | 0.8442 |
0.3189 | 3.0 | 4887 | 0.3923 | 0.8307 | 0.8366 | 0.8959 | 0.8652 | 0.8287 |
Framework versions
- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.3
- Tokenizers 0.13.3