mrzjy commited on
Commit
834bb79
·
verified ·
1 Parent(s): 9df8bae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -434,7 +434,10 @@ There are many PRM related papers one can refer to, and [A Roadmap to Reproduce
434
 
435
  The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
436
 
437
- This difference of PRM design choice arises because obtaining step-wise reward signals for **reasoning in creative writing** is inherently challenging, with frequent ambiguity and subjectivity. Annotators may struggle to determine whether a particular reasoning step in the creative process is good or bad. Unlike math problems, where correctness is well-defined, creative writing allows for **open** valid paths—each leading to a unique outcome, as "all roads lead to Rome."
 
 
 
438
 
439
  On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?
440
 
 
434
 
435
  The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
436
 
437
+ This difference of PRM design choice arises because:
438
+
439
+ - Obtaining step-wise reward signals for **reasoning in creative writing** is inherently challenging, with frequent ambiguity and subjectivity. Annotators may struggle to determine whether a particular reasoning step in the creative process is good or bad. Unlike math problems, where correctness is well-defined, creative writing allows for **open** valid paths—each leading to a unique outcome, as "all roads lead to Rome."
440
+ - While the final answer to a math problem is often a single number, the final output of creative writing involves much longer and more complex content.
441
 
442
  On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?
443