haritzpuerto commited on
Commit
e4df7b3
·
verified ·
1 Parent(s): 4b74509

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -4,8 +4,25 @@ language:
4
  - en
5
  library_name: peft
6
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
 
 
 
 
 
 
 
9
 
10
  # Load the Model
11
  ```
@@ -27,6 +44,39 @@ tokenizer = AutoTokenizer.from_pretrained(base_model_path)
27
  ```
28
 
29
  # Run the model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```
31
  prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
32
  inputs = tokenizer(prompt, return_tensors="pt")
@@ -67,4 +117,21 @@ Step 3: Choose the best option.
67
  The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
68
 
69
  [Final answer] D) Record the details of the investigation.</s>
70
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - en
5
  library_name: peft
6
  pipeline_tag: text-generation
7
+ datasets:
8
+ - allenai/ai2_arc
9
+ - tasksource/Boardgame-QA
10
+ - skrishna/coin_flip
11
+ - openai/gsm8k
12
+ - hotpotqa/hotpot_qa
13
+ - ChilleD/LastLetterConcat
14
+ - allenai/quartz
15
+ - tasksource/strategy-qa
16
+ - ConditionalQA
17
  ---
18
 
19
+ This is the official model from the publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models" (arXiv, 2024).
20
+
21
+ > TLDR: Divergent Chain of Thought (DCoT) consists of requiring models to generate multiple CoTs before choosing an answer and adding DCoT data to instruction tuning allows models to improve performance through self-correction.
22
+
23
+
24
+ Stay tuned for the release of the paper!
25
+
26
 
27
  # Load the Model
28
  ```
 
44
  ```
45
 
46
  # Run the model
47
+
48
+ ## Prompt Template
49
+
50
+ ```
51
+ [Question] {question} [Context] {document} [Options] {answer_options} [Number of answers] {k}
52
+ ```
53
+
54
+ Note, that not all commands (text in brackets) are mandatory. `[Context]` and `[Options]` are optional.
55
+ - `[Context]` refers to a paragraph that contains the answer to a question (for span-extraction QA).
56
+ - `[Options]` refers to a list of candidate answers (for multiple-choice QA). The format is `A) {answer option 1} B) {answer option 2}, ...`
57
+
58
+ The minimal template is
59
+
60
+ ```
61
+ [Question] {question} [Number of answers] {k}
62
+ ```
63
+
64
+ The inclusion of context and options depends on your tasks.
65
+
66
+ ## Response format
67
+ You should expect the model returning the following type of text
68
+
69
+ ```
70
+ [Answer 1]CoT_1
71
+ [Answer 2]CoT_2
72
+ ...
73
+ [Final answer] answer
74
+ ```
75
+
76
+ You should get as many answers as requested with the command `[Number of answers] {k}`
77
+
78
+ ## Run Example
79
+
80
  ```
81
  prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
82
  inputs = tokenizer(prompt, return_tensors="pt")
 
117
  The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
118
 
119
  [Final answer] D) Record the details of the investigation.</s>
120
+ ```
121
+
122
+
123
+ # Training details
124
+ We train all models using LoRA with the PEFT library. The main parameters are:
125
+
126
+ | Param. name | Value |
127
+ |---------------------|:-------------------:|
128
+ | lora\_r | 64 |
129
+ | lora\_alpha | 16 |
130
+ | lora\_dropout | 0.1 |
131
+ | batch size | 4 |
132
+ | learning\_rate | 2e-4 |
133
+ | weight\_decay | 0.001 |
134
+ | optim | paged\_adamw\_32bit |
135
+ | lr\_scheduler\_type | constant |
136
+
137
+ Please check Appendix B of the paper for more details.