SaisExperiments commited on
Commit
68660f9
1 Parent(s): 4c7d2f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +233 -3
README.md CHANGED
@@ -1,3 +1,233 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model:
4
+ - Lambent/arsenic-nemo-unleashed-12B
5
+ ---
6
+
7
+ # GGUF quantizations of [Lambent/arsenic-nemo-unleashed-12B](https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B)
8
+
9
+ ## Original card
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ <img src="https://cdn.midjourney.com/13dd14c8-9bf4-41af-aa96-c4298a9cb2b5/0_2.jpeg"></img>
14
+
15
+ Motive: The gutenberg tunes are lovely but all the chatml variants seem to present many issues for merging, and have broken context later on.
16
+ Decided to see how it worked to tune directly on Unleashed. eq-bench is about a point and a half lower, which isn't drastic but suggests it might benefit from some additional workings.
17
+
18
+ In hindsight, there actually *is* a gutenberg tune mixed into Unleashed, so this intensifies the style a fair degree. Poetry leans a bit archaic.
19
+ I rather like the impact personally.
20
+
21
+ As is traditional, she got at least one quirk from DPO.
22
+ In this case it seems to be sometimes briefly slipping into Arabic while chatting.
23
+ One of the more charming ones I've seen.
24
+
25
+ Quality of life improvements in some circumstances:
26
+ * Assigned the pad token as the pad token for fine-tuning
27
+ * Had Axolotl add the chat template (useful on Runpod maybe?)
28
+
29
+ Substance: DPO-tuning on a mix of gutenberg-dpo and toxic-dpo, in the hope of getting enough classic human talent and edge to write well with.
30
+ Some of the most beautiful pigments are the most poisonous.
31
+
32
+ Testing: Not sure if I'm loading long context right, but it seems functional when I load it with 16k but not at 30k. Might just be doing it wrong.
33
+
34
+ Sample generated poetry (from the same start and only varied by a low temperature, for why they start the same):
35
+
36
+ ```
37
+ Upon the shore of time,
38
+ Where waves of years do beat,
39
+ I stood and watched as they climb
40
+ The cliffs of eternity.
41
+
42
+ In every cresting o'er the deep,
43
+ A life was borne upon the tide;
44
+ Each wave that did break at my feet,
45
+ Did bear away some soul that had died.
46
+
47
+ And yet, amidst all this great throng,
48
+ There came to me a voice soft and low:
49
+ "O mortal man! What seekest thou here?"
50
+ Then answered I: "I seek the path of woe."
51
+
52
+ "Thou seekest," quoth she, "that which is lost;
53
+ For lo! The waves have borne it away.
54
+ Seek not then for that which once was thine own,
55
+ But rather turn thy face towards the light of day."
56
+
57
+ And so saying, away did she go,
58
+ Leaving me alone upon the shore.
59
+ Then turned I my steps unto mine own home,
60
+ And left behind me all thoughts of sorrow and of woe.
61
+ ```
62
+
63
+ ```
64
+ Upon the canvas of the azure sky,
65
+ The sun doth paint his golden chariot's course.
66
+ His radiant beams like arrows fly,
67
+ And pierce the clouds with light's resplendent force.
68
+
69
+ The verdant fields in emerald array,
70
+ Reflect the glory of his noontide reign;
71
+ While every blossom, bathed in warm display,
72
+ Exhales its sweet perfume on the plain.
73
+
74
+ But lo! as evening's purple curtain falls,
75
+ The sun doth sink behind the western hill.
76
+ His parting rays like lovers' farewell calls,
77
+ And gilds the heavens with a rosy thrill.
78
+
79
+ Then let us praise this daily wondrous sight,
80
+ And bless the Maker of the world so bright.
81
+ ```
82
+
83
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
84
+ <details><summary>See axolotl config</summary>
85
+
86
+ axolotl version: `0.4.1`
87
+ ```yaml
88
+ base_model: MarinaraSpaghetti/NemoMix-Unleashed-12B
89
+ model_type: AutoModelForCausalLM
90
+ tokenizer_type: AutoTokenizer
91
+ trust_remote_code: true
92
+
93
+ save_safetensors: true
94
+
95
+ load_in_8bit: false
96
+ load_in_4bit: true
97
+ strict: false
98
+
99
+ special_tokens:
100
+ pad_token: <pad>
101
+
102
+ rl: dpo
103
+ # total_num_tokens:
104
+ datasets:
105
+ - path: jondurbin/gutenberg-dpo-v0.1
106
+ split: train
107
+ type:
108
+ field_system: system
109
+ field_prompt: prompt
110
+ field_chosen: chosen
111
+ field_rejected: rejected
112
+ prompt_format: "[INST]{prompt}[/INST]"
113
+ chosen_format: "{chosen}"
114
+ rejected_format: "{rejected}"
115
+ - path: unalignment/toxic-dpo-v0.2
116
+ split: train
117
+ type:
118
+ field_system: system
119
+ field_prompt: prompt
120
+ field_chosen: chosen
121
+ field_rejected: rejected
122
+ prompt_format: "[INST]{prompt}[/INST]"
123
+ chosen_format: "{chosen}"
124
+ rejected_format: "{rejected}"
125
+
126
+ dataset_prepared_path: prepared-dpo
127
+ output_dir: ./dpoq
128
+ val_set_size: 0.001
129
+
130
+ seed: 1
131
+
132
+ sequence_len: 2048
133
+ sample_packing: false
134
+ eval_sample_packing: false
135
+ pad_to_sequence_len: false
136
+
137
+ chat_template: inst
138
+
139
+ adapter: qlora
140
+ lora_model_dir:
141
+ lora_r: 256
142
+ lora_alpha: 256
143
+ lora_dropout: 0.05
144
+ lora_target_linear: true
145
+ lora_fan_in_fan_out:
146
+ peft_use_dora: true
147
+
148
+ wandb_project: unleashed-qlora-dpo
149
+ wandb_entity:
150
+ wandb_watch:
151
+ wandb_name:
152
+ wandb_log_model:
153
+
154
+ gradient_accumulation_steps: 16
155
+ micro_batch_size: 1
156
+ num_epochs: 1
157
+ optimizer: paged_adamw_8bit
158
+ lr_scheduler: cosine
159
+ learning_rate: 0.00002
160
+ cosine_min_lr_ratio: 0.1
161
+ cosine_constant_lr_ratio: 0.95
162
+
163
+ train_on_inputs: false
164
+ group_by_length: false
165
+ bf16: true
166
+ fp16:
167
+ tf32: false
168
+
169
+ gradient_checkpointing: true
170
+ early_stopping_patience:
171
+ resume_from_checkpoint:
172
+ local_rank:
173
+ logging_steps: 1
174
+ xformers_attention:
175
+ flash_attention: true
176
+
177
+ warmup_steps: 16
178
+ evals_per_epoch: 8
179
+ saves_per_epoch: 8
180
+ save_total_limit: 2
181
+ debug:
182
+ deepspeed:
183
+ weight_decay: 0.001
184
+ fsdp:
185
+ fsdp_config:
186
+
187
+ ```
188
+
189
+ </details><br>
190
+
191
+ # dpoq
192
+
193
+ This model is a fine-tuned version of [MarinaraSpaghetti/NemoMix-Unleashed-12B](https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B) on the None dataset.
194
+
195
+ ## Model description
196
+
197
+ More information needed
198
+
199
+ ## Intended uses & limitations
200
+
201
+ More information needed
202
+
203
+ ## Training and evaluation data
204
+
205
+ More information needed
206
+
207
+ ## Training procedure
208
+
209
+ ### Training hyperparameters
210
+
211
+ The following hyperparameters were used during training:
212
+ - learning_rate: 2e-05
213
+ - train_batch_size: 1
214
+ - eval_batch_size: 8
215
+ - seed: 42
216
+ - gradient_accumulation_steps: 16
217
+ - total_train_batch_size: 16
218
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
219
+ - lr_scheduler_type: cosine
220
+ - lr_scheduler_warmup_steps: 16
221
+ - training_steps: 92
222
+
223
+ ### Training results
224
+
225
+
226
+
227
+ ### Framework versions
228
+
229
+ - PEFT 0.12.0
230
+ - Transformers 4.44.2
231
+ - Pytorch 2.3.1+cu121
232
+ - Datasets 2.20.0
233
+ - Tokenizers 0.19.1