reward_model / README.md
paulovsantanas's picture
distilroberta-base-reward-model
dfd86b3 verified
metadata
library_name: transformers
license: apache-2.0
base_model: distilroberta-base
tags:
  - trl
  - reward-trainer
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: reward_model
    results: []

reward_model

This model is a fine-tuned version of distilroberta-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5988
  • Accuracy: 0.65

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.7043 0.0150 20 0.6886 0.54
0.6671 0.0301 40 0.6924 0.53
0.6131 0.0451 60 0.7038 0.58
0.6149 0.0602 80 0.6759 0.6
0.6539 0.0752 100 0.6593 0.58
0.6671 0.0902 120 0.7227 0.59
0.6863 0.1053 140 0.6452 0.58
0.6332 0.1203 160 0.6394 0.64
0.6259 0.1353 180 0.6630 0.61
0.6257 0.1504 200 0.6369 0.61
0.5376 0.1654 220 0.6460 0.62
0.6734 0.1805 240 0.6404 0.62
0.724 0.1955 260 0.7469 0.6
0.541 0.2105 280 0.6295 0.64
0.5495 0.2256 300 0.6182 0.65
0.7581 0.2406 320 0.6262 0.6
0.5234 0.2556 340 0.6228 0.63
0.5787 0.2707 360 0.6208 0.64
0.6025 0.2857 380 0.6069 0.65
0.6061 0.3008 400 0.6166 0.65
0.8482 0.3158 420 0.6078 0.65
0.5613 0.3308 440 0.5940 0.65
0.7284 0.3459 460 0.6042 0.65
0.5778 0.3609 480 0.5990 0.65
0.6848 0.3759 500 0.5988 0.65

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3