Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 ± 0.0047
acc_norm 0.8525 ± 0.0035
arc_challenge 0 acc 0.6348 ± 0.0141
acc_norm 0.6698 ± 0.0137
winogrande 0 acc 0.7861 ± 0.0115
gsm8k 0 acc 0.5694 ± 0.0136
Downloads last month
806
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges
3 models
Quantizations
1 model

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 5

Collection including chargoddard/loyal-piano-m7-cdpo