loyal-piano-m7-cdpo / README.md
chargoddard's picture
Update README.md
5f5a78b
---
license: cc-by-nc-4.0
datasets:
- HuggingFaceH4/ultrafeedback_binarized
language:
- en
---
Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.
Some initial benchmark results:
| Task |Version| Metric |Value | |Stderr|
|---------|------:|--------|-----:|---|-----:|
|hellaswag| 0|acc |0.6621|± |0.0047|
| | |acc_norm|0.8525|± |0.0035|
|arc_challenge| 0|acc |0.6348|± |0.0141|
| | |acc_norm|0.6698|± |0.0137|
|winogrande| 0|acc |0.7861|± |0.0115|
|gsm8k| 0|acc |0.5694|± |0.0136|