Mistral-7B-Instruct-v0.2-multilingual-dpo-v1.0-v2

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the nthakur/multilingual-ultrafeedback-binarized-dpo-v0.1, the nthakur/multilingual-distilabel-intel-orca-dpo-pairs-v0.1, the nthakur/multilingual-truthy-dpo-pairs-v0.1 and the nthakur/GSM8KInstruct-Parallel-instruct-dpo-v0.1 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.1324
  • Rewards/chosen: -2.6738
  • Rewards/rejected: -12.2394
  • Rewards/accuracies: 0.9377
  • Rewards/margins: 9.5656
  • Logps/rejected: -1515.8665
  • Logps/chosen: -607.0774
  • Logits/rejected: 0.4952
  • Logits/chosen: 0.3030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 24
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2695 0.1361 500 0.2653 -0.4399 -4.5379 0.8680 4.0981 -745.7153 -383.6803 -1.3998 -1.5327
0.4349 0.2723 1000 0.3152 -2.6018 -7.1212 0.8515 4.5195 -1004.0471 -599.8698 4.1724 4.7868
0.531 0.4084 1500 0.4873 -2.4253 -8.0681 0.7855 5.6428 -1098.7278 -582.2241 -1.5195 -1.6538
0.1681 0.5446 2000 0.2003 -3.9555 -13.1169 0.9089 9.1613 -1603.6106 -735.2488 -0.1888 -0.3742
0.1778 0.6807 2500 0.2004 -3.4745 -11.9768 0.9242 8.5023 -1489.6012 -687.1464 -0.7118 -0.9608
0.1342 0.8169 3000 0.1452 -3.0928 -12.8477 0.9340 9.7549 -1576.6960 -648.9738 0.6727 0.5428
0.1252 0.9530 3500 0.1328 -2.7014 -12.3976 0.9383 9.6962 -1531.6849 -609.8344 0.5002 0.3026

Framework versions

  • PEFT 0.7.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for nthakur/Mistral-7B-Instruct-v0.2-multilingual-dpo-v1.0-v2

Adapter
(889)
this model

Datasets used to train nthakur/Mistral-7B-Instruct-v0.2-multilingual-dpo-v1.0-v2