|
--- |
|
language: |
|
- en |
|
thumbnail: |
|
tags: |
|
- text-classification |
|
license: mit |
|
datasets: |
|
- imdb |
|
metrics: |
|
--- |
|
|
|
# IMDB Sentiment Task: roberta-base |
|
|
|
## Model description |
|
|
|
A simple base roBERTa model trained on the "imdb" dataset. |
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
##### Transformers |
|
|
|
```python |
|
# Load model and tokenizer |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model = AutoModelForQuestionAnswering.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Use pipeline |
|
from transformers import pipeline |
|
|
|
model_name = "aychang/roberta-base-imdb" |
|
|
|
nlp = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name) |
|
|
|
results = nlp(["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."]) |
|
``` |
|
|
|
##### AdaptNLP |
|
|
|
```python |
|
from adaptnlp import EasySequenceClassifier |
|
|
|
model_name = "aychang/roberta-base-imdb" |
|
texts = ["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."] |
|
|
|
classifer = EasySequenceClassifier |
|
results = classifier.tag_text(text=texts, model_name_or_path=model_name, mini_batch_size=2) |
|
``` |
|
|
|
#### Limitations and bias |
|
|
|
This is minimal language model trained on a benchmark dataset. |
|
|
|
## Training data |
|
|
|
IMDB https://huggingface.co/datasets/imdb |
|
|
|
## Training procedure |
|
|
|
#### Hardware |
|
One V100 |
|
|
|
#### Hyperparameters and Training Args |
|
```python |
|
from transformers import TrainingArguments |
|
|
|
training_args = TrainingArguments( |
|
output_dir='./models', |
|
overwrite_output_dir=False, |
|
num_train_epochs=2, |
|
per_device_train_batch_size=8, |
|
per_device_eval_batch_size=8, |
|
warmup_steps=500, |
|
weight_decay=0.01, |
|
evaluation_strategy="steps", |
|
logging_dir='./logs', |
|
fp16=False, |
|
eval_steps=800, |
|
save_steps=300000 |
|
) |
|
``` |
|
|
|
## Eval results |
|
|
|
``` |
|
{'epoch': 2.0, |
|
'eval_accuracy': 0.94668, |
|
'eval_f1': array([0.94603457, 0.94731017]), |
|
'eval_loss': 0.2578844428062439, |
|
'eval_precision': array([0.95762642, 0.93624502]), |
|
'eval_recall': array([0.93472, 0.95864]), |
|
'eval_runtime': 244.7522, |
|
'eval_samples_per_second': 102.144} |
|
``` |
|
|