prithivMLmods
/

PRM-Math-7B-Reasoner

Text Classification

text-generation

text-generation-inference

Model card Files Files and versions

prithivMLmods commited on Jan 19

Commit

7a3bceb

·

verified ·

1 Parent(s): 4534e68

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -12,5 +12,9 @@ tags:
 base_model:
 - Qwen/Qwen2.5-Math-7B-PRM800K
 ---
 PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "<extra_0>" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.

 base_model:
 - Qwen/Qwen2.5-Math-7B-PRM800K
 ---
+# **PRM-Math-7B-Reasoner - Process Reward Model**
+`PRM's : To identify and mitigate intermediate errors in the reasoning processes`
 PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "<extra_0>" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.