chrisliu298 commited on
Commit
37fd8ca
·
verified ·
1 Parent(s): ee0ec75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -40,7 +40,7 @@ The Skywork-o1-Open-PRM series are trained on [**Qwen2.5-Math-1.5B-Instruct**](h
40
  We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
41
 
42
  ### Code Evaluation
43
- For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder), maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**.
44
 
45
 
46
  ## Evaluation Base Models
@@ -262,7 +262,7 @@ If you find our work helpful, please feel free to cite us using the following Bi
262
  title={Skywork-o1 Open Series},
263
  author={Skywork-o1 Team},
264
  year={2024},
265
- month={September},
266
  howpublished={\url{https://huggingface.co/Skywork}},
267
  url={https://huggingface.co/Skywork},
268
  }
 
40
  We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
41
 
42
  ### Code Evaluation
43
+ For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder) while largely maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**. We use the latest version (0.3.1) of [evalplus](https://github.com/evalplus/evalplus) due to issues with tests and code sanitization in previous versions.
44
 
45
 
46
  ## Evaluation Base Models
 
262
  title={Skywork-o1 Open Series},
263
  author={Skywork-o1 Team},
264
  year={2024},
265
+ month={November},
266
  howpublished={\url{https://huggingface.co/Skywork}},
267
  url={https://huggingface.co/Skywork},
268
  }