qwen_orpo_entropy / README.md
yakazimir's picture
End of training
f64a019 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_orpo_entropy
    results: []

qwen_orpo_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5257
  • Rewards/chosen: -5.2305
  • Rewards/rejected: -6.3460
  • Rewards/accuracies: 0.7285
  • Rewards/margins: 1.1155
  • Logps/rejected: -6.3460
  • Logps/chosen: -5.2305
  • Logits/rejected: 0.3311
  • Logits/chosen: 0.2347

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7057 0.2141 400 0.7062 -1.5154 -1.6776 0.5571 0.1622 -1.6776 -1.5154 0.3357 0.2498
0.597 0.4282 800 0.5956 -2.3371 -2.7822 0.6669 0.4451 -2.7822 -2.3371 0.4528 0.3584
0.5883 0.6422 1200 0.5486 -3.5511 -4.2123 0.7211 0.6612 -4.2123 -3.5511 0.3923 0.2876
0.4794 0.8563 1600 0.5320 -3.5255 -4.2178 0.7277 0.6924 -4.2178 -3.5255 0.3881 0.2849
0.5765 1.0704 2000 0.5305 -3.6701 -4.4352 0.7240 0.7651 -4.4352 -3.6701 0.3104 0.1978
0.5449 1.2845 2400 0.5198 -4.3149 -5.2348 0.7352 0.9199 -5.2348 -4.3149 0.2247 0.1184
0.518 1.4986 2800 0.5189 -4.2439 -5.1423 0.7352 0.8983 -5.1423 -4.2439 0.3318 0.2186
0.5602 1.7127 3200 0.5174 -4.3315 -5.2509 0.7381 0.9194 -5.2509 -4.3315 0.3472 0.2362
0.5482 1.9267 3600 0.5152 -4.3680 -5.3320 0.7329 0.9640 -5.3320 -4.3680 0.3330 0.2233
0.4259 2.1408 4000 0.5296 -5.1372 -6.2156 0.7270 1.0783 -6.2156 -5.1372 0.3103 0.2143
0.4141 2.3549 4400 0.5245 -5.3001 -6.3996 0.7277 1.0995 -6.3996 -5.3001 0.3776 0.2775
0.4481 2.5690 4800 0.5253 -5.2343 -6.3529 0.7307 1.1185 -6.3529 -5.2343 0.4139 0.3107
0.3925 2.7831 5200 0.5251 -5.2099 -6.3202 0.7285 1.1103 -6.3202 -5.2099 0.3386 0.2411
0.4044 2.9972 5600 0.5257 -5.2305 -6.3460 0.7285 1.1155 -6.3460 -5.2305 0.3311 0.2347

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1