Windy0822 commited on
Commit
ebe7812
·
verified ·
1 Parent(s): 0049462

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -1,16 +1,16 @@
1
- ---
2
- license: mit
3
- datasets:
4
- - peiyi9979/Math-Shepherd
5
- language:
6
- - en
7
- base_model:
8
- - deepseek-ai/deepseek-math-7b-base
9
- pipeline_tag: reinforcement-learning
10
- ---
11
  ## Introduction
12
  <div align="center">
13
- <img src="figures/PQM.png" width="822px">
14
  </div>
15
 
16
  We present a new framework for PRM by framing it as a $Q$-value ranking problem, providing a theoretical basis for reward modeling that captures inter-dependencies among reasoning states.
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - peiyi9979/Math-Shepherd
5
+ language:
6
+ - en
7
+ base_model:
8
+ - deepseek-ai/deepseek-math-7b-base
9
+ pipeline_tag: reinforcement-learning
10
+ ---
11
  ## Introduction
12
  <div align="center">
13
+ <img src="PQM.png" width="822px">
14
  </div>
15
 
16
  We present a new framework for PRM by framing it as a $Q$-value ranking problem, providing a theoretical basis for reward modeling that captures inter-dependencies among reasoning states.