ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter6 2B • Updated Apr 17 • 3
ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter5 2B • Updated Apr 17 • 2
ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter4 2B • Updated Apr 16 • 2
ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter3 2B • Updated Apr 16 • 2
ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter2 2B • Updated Apr 16 • 2
ScaleML-RLHF/Qwen2.5-Math-1.5B-grpo-em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-chunk4-iter1 2B • Updated Apr 16 • 3
ScaleML-RLHF/Qwen2.5-Math-1.5B-raft-plusplus-numina_math_em-sample1n16-sample16-iter4 2B • Updated Apr 7 • 3
ScaleML-RLHF/Qwen2.5-Math-1.5B-raft-plusplus-numina_math_em-sample1n16-sample16-iter3 2B • Updated Apr 7 • 3