|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- HuggingFaceH4/ultrafeedback_binarized |
|
language: |
|
- en |
|
--- |
|
|
|
Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending. |
|
|
|
Some initial benchmark results: |
|
| Task |Version| Metric |Value | |Stderr| |
|
|---------|------:|--------|-----:|---|-----:| |
|
|hellaswag| 0|acc |0.6621|± |0.0047| |
|
| | |acc_norm|0.8525|± |0.0035| |
|
|arc_challenge| 0|acc |0.6348|± |0.0141| |
|
| | |acc_norm|0.6698|± |0.0137| |
|
|winogrande| 0|acc |0.7861|± |0.0115| |
|
|gsm8k| 0|acc |0.5694|± |0.0136| |