Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,30 @@ I then performed the following steps 4 times:
|
|
29 |
|
30 |
This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## Hyperparameters
|
33 |
|
34 |
For the initial supervised finetuning step:
|
|
|
29 |
|
30 |
This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
|
31 |
|
32 |
+
|
33 |
+
## Prompt formats
|
34 |
+
|
35 |
+
The format for reddit-instruct and oasst2 was:
|
36 |
+
|
37 |
+
```
|
38 |
+
### User:
|
39 |
+
[insert instruction here]
|
40 |
+
### Assistant:
|
41 |
+
[insert response here]
|
42 |
+
### User:
|
43 |
+
...
|
44 |
+
```
|
45 |
+
|
46 |
+
The format for TinyCot was:
|
47 |
+
```
|
48 |
+
### User:
|
49 |
+
[insert instruction here]
|
50 |
+
### Rationale:
|
51 |
+
[insert reasoning here]
|
52 |
+
### Answer:
|
53 |
+
[insert direct answer here]
|
54 |
+
```
|
55 |
+
|
56 |
## Hyperparameters
|
57 |
|
58 |
For the initial supervised finetuning step:
|