mrzjy
/

NovelWriting-Outline-PRM-Qwen2.5-0.5B-Reward

Token Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mrzjy commited on 16 days ago

Commit

834bb79

·

verified ·

1 Parent(s): 9df8bae

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -434,7 +434,10 @@ There are many PRM related papers one can refer to, and [A Roadmap to Reproduce
 The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
-This difference of PRM design choice arises because obtaining step-wise reward signals for **reasoning in creative writing** is inherently challenging, with frequent ambiguity and subjectivity. Annotators may struggle to determine whether a particular reasoning step in the creative process is good or bad. Unlike math problems, where correctness is well-defined, creative writing allows for **open** valid paths—each leading to a unique outcome, as "all roads lead to Rome."
 On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?

 The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
+This difference of PRM design choice arises because:
+- Obtaining step-wise reward signals for **reasoning in creative writing** is inherently challenging, with frequent ambiguity and subjectivity. Annotators may struggle to determine whether a particular reasoning step in the creative process is good or bad. Unlike math problems, where correctness is well-defined, creative writing allows for **open** valid paths—each leading to a unique outcome, as "all roads lead to Rome."
+- While the final answer to a math problem is often a single number, the final output of creative writing involves much longer and more complex content.
 On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?