euclaise
/

Memphis-CoT-3B

Text Generation

supertrainer2000

Model card Files Files and versions Community

euclaise commited on Jan 30

Commit

38dc0f6

•

1 Parent(s): 12e5655

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -29,6 +29,30 @@ I then performed the following steps 4 times:
 This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
 ## Hyperparameters
 For the initial supervised finetuning step:

 This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
+## Prompt formats
+The format for reddit-instruct and oasst2 was:
+```
+### User:
+[insert instruction here]
+### Assistant:
+[insert response here]
+### User:
+...
+```
+The format for TinyCot was:
+```
+### User:
+[insert instruction here]
+### Rationale:
+[insert reasoning here]
+### Answer:
+[insert direct answer here]
+```
 ## Hyperparameters
 For the initial supervised finetuning step: