Best practice for R1 models evaluation: Reasoning efficiency and Performance by MATH-Level
#198
by
wangxingjun778
- opened
EvalScope - LLM Evaluation Framework: https://github.com/modelscope/evalscope
Best Practices for Evaluating R1 Class Model Inference Capabilities
https://evalscope.readthedocs.io/en/latest/best_practice/deepseek_r1_distill.htmlBest Practices for Evaluating Reasoning Efficiency
https://evalscope.readthedocs.io/en/latest/best_practice/think_eval.htmlBest Practices for Evaluating R1, QwQ Inference Efficiency and Math-level:
https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html