Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,25 @@ language:
|
|
4 |
- en
|
5 |
library_name: peft
|
6 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
# Load the Model
|
11 |
```
|
@@ -27,6 +44,39 @@ tokenizer = AutoTokenizer.from_pretrained(base_model_path)
|
|
27 |
```
|
28 |
|
29 |
# Run the model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
|
32 |
inputs = tokenizer(prompt, return_tensors="pt")
|
@@ -67,4 +117,21 @@ Step 3: Choose the best option.
|
|
67 |
The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
|
68 |
|
69 |
[Final answer] D) Record the details of the investigation.</s>
|
70 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- en
|
5 |
library_name: peft
|
6 |
pipeline_tag: text-generation
|
7 |
+
datasets:
|
8 |
+
- allenai/ai2_arc
|
9 |
+
- tasksource/Boardgame-QA
|
10 |
+
- skrishna/coin_flip
|
11 |
+
- openai/gsm8k
|
12 |
+
- hotpotqa/hotpot_qa
|
13 |
+
- ChilleD/LastLetterConcat
|
14 |
+
- allenai/quartz
|
15 |
+
- tasksource/strategy-qa
|
16 |
+
- ConditionalQA
|
17 |
---
|
18 |
|
19 |
+
This is the official model from the publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models" (arXiv, 2024).
|
20 |
+
|
21 |
+
> TLDR: Divergent Chain of Thought (DCoT) consists of requiring models to generate multiple CoTs before choosing an answer and adding DCoT data to instruction tuning allows models to improve performance through self-correction.
|
22 |
+
|
23 |
+
|
24 |
+
Stay tuned for the release of the paper!
|
25 |
+
|
26 |
|
27 |
# Load the Model
|
28 |
```
|
|
|
44 |
```
|
45 |
|
46 |
# Run the model
|
47 |
+
|
48 |
+
## Prompt Template
|
49 |
+
|
50 |
+
```
|
51 |
+
[Question] {question} [Context] {document} [Options] {answer_options} [Number of answers] {k}
|
52 |
+
```
|
53 |
+
|
54 |
+
Note, that not all commands (text in brackets) are mandatory. `[Context]` and `[Options]` are optional.
|
55 |
+
- `[Context]` refers to a paragraph that contains the answer to a question (for span-extraction QA).
|
56 |
+
- `[Options]` refers to a list of candidate answers (for multiple-choice QA). The format is `A) {answer option 1} B) {answer option 2}, ...`
|
57 |
+
|
58 |
+
The minimal template is
|
59 |
+
|
60 |
+
```
|
61 |
+
[Question] {question} [Number of answers] {k}
|
62 |
+
```
|
63 |
+
|
64 |
+
The inclusion of context and options depends on your tasks.
|
65 |
+
|
66 |
+
## Response format
|
67 |
+
You should expect the model returning the following type of text
|
68 |
+
|
69 |
+
```
|
70 |
+
[Answer 1]CoT_1
|
71 |
+
[Answer 2]CoT_2
|
72 |
+
...
|
73 |
+
[Final answer] answer
|
74 |
+
```
|
75 |
+
|
76 |
+
You should get as many answers as requested with the command `[Number of answers] {k}`
|
77 |
+
|
78 |
+
## Run Example
|
79 |
+
|
80 |
```
|
81 |
prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
|
82 |
inputs = tokenizer(prompt, return_tensors="pt")
|
|
|
117 |
The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
|
118 |
|
119 |
[Final answer] D) Record the details of the investigation.</s>
|
120 |
+
```
|
121 |
+
|
122 |
+
|
123 |
+
# Training details
|
124 |
+
We train all models using LoRA with the PEFT library. The main parameters are:
|
125 |
+
|
126 |
+
| Param. name | Value |
|
127 |
+
|---------------------|:-------------------:|
|
128 |
+
| lora\_r | 64 |
|
129 |
+
| lora\_alpha | 16 |
|
130 |
+
| lora\_dropout | 0.1 |
|
131 |
+
| batch size | 4 |
|
132 |
+
| learning\_rate | 2e-4 |
|
133 |
+
| weight\_decay | 0.001 |
|
134 |
+
| optim | paged\_adamw\_32bit |
|
135 |
+
| lr\_scheduler\_type | constant |
|
136 |
+
|
137 |
+
Please check Appendix B of the paper for more details.
|