model_hh_shp3_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2700
  • Rewards/chosen: -2.0808
  • Rewards/rejected: -2.6185
  • Rewards/accuracies: 0.5300
  • Rewards/margins: 0.5377
  • Logps/rejected: -216.1040
  • Logps/chosen: -234.7784
  • Logits/rejected: -0.6432
  • Logits/chosen: -0.6992

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 8.0 100 2.2133 -1.8646 -2.4313 0.5300 0.5667 -215.8960 -234.5382 -0.6413 -0.6979
0.0 16.0 200 2.2571 -1.9454 -2.5096 0.5300 0.5642 -215.9830 -234.6279 -0.6423 -0.6991
0.0 24.0 300 2.2275 -1.9722 -2.5264 0.5200 0.5542 -216.0016 -234.6577 -0.6429 -0.6988
0.0 32.0 400 2.2729 -2.0276 -2.5437 0.5200 0.5161 -216.0209 -234.7193 -0.6425 -0.6991
0.0 40.0 500 2.2476 -2.0622 -2.6344 0.5300 0.5723 -216.1217 -234.7577 -0.6440 -0.7005
0.0 48.0 600 2.2449 -2.0779 -2.6423 0.5300 0.5645 -216.1305 -234.7751 -0.6434 -0.6996
0.0 56.0 700 2.2415 -2.0486 -2.6063 0.5300 0.5577 -216.0904 -234.7426 -0.6439 -0.7000
0.0 64.0 800 2.2311 -2.0778 -2.6332 0.5300 0.5554 -216.1204 -234.7751 -0.6440 -0.7000
0.0 72.0 900 2.2534 -2.0857 -2.6363 0.5300 0.5507 -216.1238 -234.7838 -0.6437 -0.6996
0.0 80.0 1000 2.2700 -2.0808 -2.6185 0.5300 0.5377 -216.1040 -234.7784 -0.6432 -0.6992

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.1
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for guoyu-zhang/model_hh_shp3_200

Adapter
(1162)
this model