tinymagnum-r2-KTO-r1

This model is a fine-tuned version of NewEden/trashdwag on the combined_kto.json dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5003
  • Rewards/chosen: 0.0061
  • Logps/chosen: -12.0862
  • Rewards/rejected: 0.0023
  • Logps/rejected: -16.1405
  • Rewards/margins: 0.0039
  • Kl: 0.0447

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.25
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Rewards/rejected Logps/rejected Rewards/margins Kl
0.5025 0.1078 16 0.5038 0.0004 -12.1438 0.0007 -16.1563 -0.0003 0.0099
0.502 0.2157 32 0.5019 0.0033 -12.1150 0.0018 -16.1450 0.0014 0.0200
0.5026 0.3235 48 0.5013 0.0051 -12.0964 0.0027 -16.1358 0.0024 0.0335
0.5021 0.4313 64 0.5015 0.0058 -12.0893 0.0036 -16.1270 0.0022 0.0406
0.5017 0.5392 80 0.5012 0.0064 -12.0833 0.0037 -16.1265 0.0027 0.0434
0.5003 0.6470 96 0.5007 0.0066 -12.0812 0.0032 -16.1311 0.0034 0.0431
0.4996 0.7548 112 0.5012 0.0063 -12.0846 0.0028 -16.1353 0.0035 0.0437
0.5077 0.8627 128 0.5005 0.0063 -12.0844 0.0026 -16.1374 0.0037 0.0433
0.5012 0.9705 144 0.5004 0.0064 -12.0837 0.0023 -16.1401 0.0041 0.0431

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.0.dev0
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Edens-Gate/tinymagnum-r2-KTO-r1-ood

Adapter
(1)
this model