model / README.md
blazarev's picture
Model save
f4fc65a verified
metadata
license: mit
base_model: EleutherAI/gpt-neo-125M
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: model
    results: []

model

This model is a fine-tuned version of EleutherAI/gpt-neo-125M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6955
  • Rewards/chosen: -0.0079
  • Rewards/rejected: -0.0080
  • Rewards/accuracies: 0.4813
  • Rewards/margins: 0.0001
  • Logps/rejected: -478.8612
  • Logps/chosen: -494.2958
  • Logits/rejected: -18.3633
  • Logits/chosen: -18.4819

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6955 0.2992 100 0.6958 -0.0017 -0.0008 0.4701 -0.0008 -478.7900 -494.2336 -18.3637 -18.4824
0.6906 0.5984 200 0.6962 -0.0028 -0.0016 0.4744 -0.0013 -478.7974 -494.2453 -18.3625 -18.4806
0.6985 0.8975 300 0.6959 -0.0222 -0.0214 0.4738 -0.0008 -478.9952 -494.4388 -18.3624 -18.4809
0.6946 1.1967 400 0.6955 0.0015 0.0015 0.4753 0.0000 -478.7664 -494.2018 -18.3628 -18.4811
0.6946 1.4959 500 0.6960 -0.0046 -0.0040 0.4791 -0.0006 -478.8223 -494.2634 -18.3631 -18.4816
0.6952 1.7951 600 0.6951 -0.0047 -0.0057 0.4882 0.0011 -478.8391 -494.2639 -18.3636 -18.4821
0.6947 2.0942 700 0.6955 -0.0053 -0.0056 0.4822 0.0003 -478.8379 -494.2701 -18.3634 -18.4820
0.6995 2.3934 800 0.6948 -0.0060 -0.0076 0.4918 0.0015 -478.8574 -494.2774 -18.3632 -18.4818
0.6932 2.6926 900 0.6952 -0.0080 -0.0087 0.4837 0.0008 -478.8692 -494.2970 -18.3633 -18.4817
0.6964 2.9918 1000 0.6955 -0.0079 -0.0080 0.4813 0.0001 -478.8612 -494.2958 -18.3633 -18.4819

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1