Easy2Hard-Bench Collection Easy2Hard-Bench offers six datasets with continuous difficulty ratings, enabling profiling of LLM performance and generalization across difficulties. โข 7 items โข Updated Jul 3
Correct-DPO Evaluations Collection Evaluations of Correct-DPO Experiments โข 143 items โข Updated May 21
Correct-DPO Evaluations Collection Evaluations of Correct-DPO Experiments โข 143 items โข Updated May 21