Safetensors
llama
Decaderan commited on
Commit
00c65a8
·
verified ·
1 Parent(s): 1f5041a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -3
README.md CHANGED
@@ -1,3 +1,33 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Step-level Value Preference Optimization for Mathematical Reasoning
2
+
3
+ This is the official repository for paper [Step-level Value Preference Optimization for Mathematical Reasoning](https://arxiv.org/abs/2406.10858). It is extracted from our internal corporate codebase. As a result, there may be slight differences when reproducing the numbers reported in our paper, but they should be very close.
4
+
5
+
6
+ The implementation of SVPO is based on [AlphaMath](https://arxiv.org/abs/2405.03553), such as MCTS and Step-level beam search (SBS).
7
+ Therefore, we provide the [code](https://github.com/MARIO-Math-Reasoning/Super_MARIO) of step-level preference pairs construction in this repository to facilitate reproduction.
8
+
9
+
10
+ ## Citation
11
+ SVPO
12
+ ```
13
+ @misc{chen2024steplevel,
14
+ title={Step-level Value Preference Optimization for Mathematical Reasoning},
15
+ author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
16
+ year={2024},
17
+ eprint={2406.10858},
18
+ archivePrefix={arXiv},
19
+ primaryClass={cs.CL}
20
+ }
21
+ ```
22
+
23
+ AlphaMATH
24
+ ```
25
+ @misc{chen2024alphamath,
26
+ title={AlphaMath Almost Zero: process Supervision without process},
27
+ author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
28
+ year={2024},
29
+ eprint={2405.03553},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CL}
32
+ }
33
+ ```