File size: 6,286 Bytes
68660f9
 
4fd208b
 
68660f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
license: cc-by-nc-4.0
tags:
- not-for-all-audiences
base_model:
- Lambent/arsenic-nemo-unleashed-12B
---

# GGUF quantizations of [Lambent/arsenic-nemo-unleashed-12B](https://huggingface.co/Lambent/arsenic-nemo-unleashed-12B)

## Original card
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

<img src="https://cdn.midjourney.com/13dd14c8-9bf4-41af-aa96-c4298a9cb2b5/0_2.jpeg"></img>

Motive: The gutenberg tunes are lovely but all the chatml variants seem to present many issues for merging, and have broken context later on.
Decided to see how it worked to tune directly on Unleashed. eq-bench is about a point and a half lower, which isn't drastic but suggests it might benefit from some additional workings.

In hindsight, there actually *is* a gutenberg tune mixed into Unleashed, so this intensifies the style a fair degree. Poetry leans a bit archaic.
I rather like the impact personally.

As is traditional, she got at least one quirk from DPO.
In this case it seems to be sometimes briefly slipping into Arabic while chatting.
One of the more charming ones I've seen.

Quality of life improvements in some circumstances:
* Assigned the pad token as the pad token for fine-tuning
* Had Axolotl add the chat template (useful on Runpod maybe?)

Substance: DPO-tuning on a mix of gutenberg-dpo and toxic-dpo, in the hope of getting enough classic human talent and edge to write well with.
Some of the most beautiful pigments are the most poisonous.

Testing: Not sure if I'm loading long context right, but it seems functional when I load it with 16k but not at 30k. Might just be doing it wrong.

Sample generated poetry (from the same start and only varied by a low temperature, for why they start the same):

```
Upon the shore of time,
Where waves of years do beat,
I stood and watched as they climb
The cliffs of eternity.

In every cresting o'er the deep,
A life was borne upon the tide;
Each wave that did break at my feet,
Did bear away some soul that had died.

And yet, amidst all this great throng,
There came to me a voice soft and low:
"O mortal man! What seekest thou here?"
Then answered I: "I seek the path of woe."

"Thou seekest," quoth she, "that which is lost;
For lo! The waves have borne it away.
Seek not then for that which once was thine own,
But rather turn thy face towards the light of day."

And so saying, away did she go,
Leaving me alone upon the shore.
Then turned I my steps unto mine own home,
And left behind me all thoughts of sorrow and of woe.
```

```
Upon the canvas of the azure sky,
The sun doth paint his golden chariot's course.
His radiant beams like arrows fly,
And pierce the clouds with light's resplendent force.

The verdant fields in emerald array,
Reflect the glory of his noontide reign;
While every blossom, bathed in warm display,
Exhales its sweet perfume on the plain.

But lo! as evening's purple curtain falls,
The sun doth sink behind the western hill.
His parting rays like lovers' farewell calls,
And gilds the heavens with a rosy thrill.

Then let us praise this daily wondrous sight,
And bless the Maker of the world so bright.
```

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml
base_model: MarinaraSpaghetti/NemoMix-Unleashed-12B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

save_safetensors: true

load_in_8bit: false
load_in_4bit: true
strict: false

special_tokens:
  pad_token: <pad>

rl: dpo
# total_num_tokens: 
datasets:
  - path: jondurbin/gutenberg-dpo-v0.1
    split: train
    type:
      field_system: system
      field_prompt: prompt
      field_chosen: chosen
      field_rejected: rejected
      prompt_format: "[INST]{prompt}[/INST]"
      chosen_format: "{chosen}"
      rejected_format: "{rejected}"
  - path: unalignment/toxic-dpo-v0.2
    split: train
    type:
      field_system: system
      field_prompt: prompt
      field_chosen: chosen
      field_rejected: rejected
      prompt_format: "[INST]{prompt}[/INST]"
      chosen_format: "{chosen}"
      rejected_format: "{rejected}"

dataset_prepared_path: prepared-dpo
output_dir: ./dpoq
val_set_size: 0.001

seed: 1

sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

chat_template: inst

adapter: qlora
lora_model_dir:
lora_r: 256
lora_alpha: 256
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_dora: true

wandb_project: unleashed-qlora-dpo
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00002
cosine_min_lr_ratio: 0.1
cosine_constant_lr_ratio: 0.95

train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 16
evals_per_epoch: 8
saves_per_epoch: 8
save_total_limit: 2
debug:
deepspeed:
weight_decay: 0.001
fsdp:
fsdp_config:

```

</details><br>

# dpoq

This model is a fine-tuned version of [MarinaraSpaghetti/NemoMix-Unleashed-12B](https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B) on the None dataset.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 16
- training_steps: 92

### Training results



### Framework versions

- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1