Skywork
/

Skywork-o1-Open-PRM-Qwen-2.5-7B

Text Classification

Model card Files Files and versions Community

chrisliu298 commited on Nov 26, 2024

Commit

37fd8ca

·

verified ·

1 Parent(s): ee0ec75

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ The Skywork-o1-Open-PRM series are trained on [**Qwen2.5-Math-1.5B-Instruct**](h
 We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
 ### Code Evaluation
-For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder), maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**.
 ## Evaluation Base Models
@@ -262,7 +262,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
   title={Skywork-o1 Open Series},
   author={Skywork-o1 Team},
   year={2024},
-  month={September},
   howpublished={\url{https://huggingface.co/Skywork}},
   url={https://huggingface.co/Skywork},
 }

 We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
 ### Code Evaluation
+For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder) while largely maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**. We use the latest version (0.3.1) of [evalplus](https://github.com/evalplus/evalplus) due to issues with tests and code sanitization in previous versions.
 ## Evaluation Base Models
   title={Skywork-o1 Open Series},
   author={Skywork-o1 Team},
   year={2024},
+  month={November},
   howpublished={\url{https://huggingface.co/Skywork}},
   url={https://huggingface.co/Skywork},
 }