Update README.md
Browse files
README.md
CHANGED
@@ -434,7 +434,10 @@ There are many PRM related papers one can refer to, and [A Roadmap to Reproduce
|
|
434 |
|
435 |
The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
|
436 |
|
437 |
-
This difference of PRM design choice arises because
|
|
|
|
|
|
|
438 |
|
439 |
On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?
|
440 |
|
|
|
434 |
|
435 |
The main difference between a PRM for O1-like models and a PRM for this project, is that there is no reasoning process in this project at all. The process or step is defined directly as each line of the final results (with no CoT process).
|
436 |
|
437 |
+
This difference of PRM design choice arises because:
|
438 |
+
|
439 |
+
- Obtaining step-wise reward signals for **reasoning in creative writing** is inherently challenging, with frequent ambiguity and subjectivity. Annotators may struggle to determine whether a particular reasoning step in the creative process is good or bad. Unlike math problems, where correctness is well-defined, creative writing allows for **open** valid paths—each leading to a unique outcome, as "all roads lead to Rome."
|
440 |
+
- While the final answer to a math problem is often a single number, the final output of creative writing involves much longer and more complex content.
|
441 |
|
442 |
On the other hand, however, it's relatively simple to automatically construct negative outlines for an outline PRM training, hence a fast hands-on experience. Why not give it a shot?
|
443 |
|