Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ The Skywork-o1-Open-PRM series are trained on [**Qwen2.5-Math-1.5B-Instruct**](h
|
|
40 |
We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
|
41 |
|
42 |
### Code Evaluation
|
43 |
-
For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder)
|
44 |
|
45 |
|
46 |
## Evaluation Base Models
|
@@ -262,7 +262,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
|
|
262 |
title={Skywork-o1 Open Series},
|
263 |
author={Skywork-o1 Team},
|
264 |
year={2024},
|
265 |
-
month={
|
266 |
howpublished={\url{https://huggingface.co/Skywork}},
|
267 |
url={https://huggingface.co/Skywork},
|
268 |
}
|
|
|
40 |
We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
|
41 |
|
42 |
### Code Evaluation
|
43 |
+
For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder) while largely maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**. We use the latest version (0.3.1) of [evalplus](https://github.com/evalplus/evalplus) due to issues with tests and code sanitization in previous versions.
|
44 |
|
45 |
|
46 |
## Evaluation Base Models
|
|
|
262 |
title={Skywork-o1 Open Series},
|
263 |
author={Skywork-o1 Team},
|
264 |
year={2024},
|
265 |
+
month={November},
|
266 |
howpublished={\url{https://huggingface.co/Skywork}},
|
267 |
url={https://huggingface.co/Skywork},
|
268 |
}
|