luzimu's picture
modify readme
f3220f1
metadata
base_model: MathGenie/Mistral-7B-Ours-SFT
tags:
  - math
model-index:
  - name: Mistral-7B-Ours-SFT-SCDPO
    results: []
license: apache-2.0
language:
  - en
metrics:
  - accuracy
pipeline_tag: text-generation

Mistral-7B-Ours-SFT-SCDPO

This model is a fine-tuned version of MathGenie/Mistral-7B-Ours-SFT. It achieves the following results on the evaluation set:

  • Loss: 0.1793
  • Rewards/chosen: 0.2587
  • Rewards/rejected: -7.0301
  • Rewards/accuracies: 0.8947
  • Rewards/margins: 7.2889
  • Logps/rejected: -253.7773
  • Logps/chosen: -80.3105
  • Logits/rejected: -2.3417
  • Logits/chosen: -2.3846

Model description

This is a model fine-tuned for mathematical problem-solving.

Intended uses & limitations

The model is intended for solving math problems.

Training and evaluation data

eval

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3963 0.21 100 0.3636 1.8634 -0.1518 0.8816 2.0152 -184.9944 -64.2644 -2.7112 -2.7505
0.2849 0.43 200 0.2598 0.7706 -3.7221 0.8816 4.4927 -220.6974 -75.1921 -2.5067 -2.5475
0.2496 0.64 300 0.2295 0.9323 -4.2717 0.8684 5.2040 -226.1934 -73.5753 -2.5080 -2.5494
0.2331 0.86 400 0.2089 0.7871 -4.8912 0.8684 5.6783 -232.3884 -75.0269 -2.4967 -2.5382
0.0874 1.07 500 0.1872 0.6345 -5.7444 0.8816 6.3789 -240.9202 -76.5527 -2.4323 -2.4761
0.1217 1.28 600 0.1832 0.2282 -6.6907 0.8684 6.9188 -250.3827 -80.6161 -2.3741 -2.4172
0.0966 1.5 700 0.1807 0.1849 -7.0125 0.8816 7.1975 -253.6012 -81.0485 -2.3503 -2.3940
0.0755 1.71 800 0.1802 0.3224 -6.9539 0.8947 7.2763 -253.0150 -79.6739 -2.3437 -2.3867
0.1177 1.93 900 0.1793 0.2587 -7.0301 0.8947 7.2889 -253.7773 -80.3105 -2.3417 -2.3846

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2