prithivMLmods commited on
Commit
7a3bceb
·
verified ·
1 Parent(s): 4534e68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -12,5 +12,9 @@ tags:
12
  base_model:
13
  - Qwen/Qwen2.5-Math-7B-PRM800K
14
  ---
 
 
 
 
15
  PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "<extra_0>" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.
16
 
 
12
  base_model:
13
  - Qwen/Qwen2.5-Math-7B-PRM800K
14
  ---
15
+ # **PRM-Math-7B-Reasoner - Process Reward Model**
16
+
17
+ `PRM's : To identify and mitigate intermediate errors in the reasoning processes`
18
+
19
  PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "<extra_0>" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.
20