Best practice for R1 models evaluation: Reasoning efficiency and Performance by MATH-Level

#198
by wangxingjun778 - opened

EvalScope - LLM Evaluation Framework: https://github.com/modelscope/evalscope

  1. Best Practices for Evaluating R1 Class Model Inference Capabilities
    https://evalscope.readthedocs.io/en/latest/best_practice/deepseek_r1_distill.html

  2. Best Practices for Evaluating Reasoning Efficiency
    https://evalscope.readthedocs.io/en/latest/best_practice/think_eval.html

  3. Best Practices for Evaluating R1, QwQ Inference Efficiency and Math-level:
    https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html

image.png

image.png

image.png

Sign up or log in to comment