model_hh_usp1_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4291
  • Rewards/chosen: -2.1852
  • Rewards/rejected: -10.4536
  • Rewards/accuracies: 0.6900
  • Rewards/margins: 8.2684
  • Logps/rejected: -125.6639
  • Logps/chosen: -112.8688
  • Logits/rejected: -0.9672
  • Logits/chosen: -0.9637

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0025 4.0 100 1.8113 -1.7152 -5.1212 0.6100 3.4059 -119.7389 -112.3466 -0.1554 -0.1598
0.1942 8.0 200 3.6090 -1.4379 -8.2063 0.6100 6.7684 -123.1668 -112.0384 -1.0994 -1.1187
0.0502 12.0 300 3.3229 -9.0906 -16.5854 0.6200 7.4948 -132.4769 -120.5415 -0.9988 -1.0079
0.0 16.0 400 3.4296 -2.1656 -10.3972 0.6900 8.2316 -125.6012 -112.8470 -0.9657 -0.9623
0.0 20.0 500 3.4471 -2.1796 -10.4172 0.7100 8.2376 -125.6234 -112.8626 -0.9676 -0.9637
0.0 24.0 600 3.4031 -2.1735 -10.4669 0.7000 8.2933 -125.6786 -112.8558 -0.9675 -0.9640
0.0 28.0 700 3.4346 -2.1542 -10.4272 0.7000 8.2730 -125.6345 -112.8343 -0.9673 -0.9639
0.0 32.0 800 3.4246 -2.1606 -10.4103 0.6900 8.2497 -125.6157 -112.8415 -0.9675 -0.9642
0.0 36.0 900 3.4315 -2.1805 -10.4501 0.7000 8.2696 -125.6599 -112.8635 -0.9674 -0.9639
0.0 40.0 1000 3.4291 -2.1852 -10.4536 0.6900 8.2684 -125.6639 -112.8688 -0.9672 -0.9637

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for guoyu-zhang/model_hh_usp1_400

Adapter
(1161)
this model