Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 Β± 0.0047
acc_norm 0.8525 Β± 0.0035
arc_challenge 0 acc 0.6348 Β± 0.0141
acc_norm 0.6698 Β± 0.0137
winogrande 0 acc 0.7861 Β± 0.0115
gsm8k 0 acc 0.5694 Β± 0.0136
Downloads last month
1,217
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges
3 models
Quantizations
2 models

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 6

Collection including chargoddard/loyal-piano-m7-cdpo