Triangle104 commited on
Commit
77431be
1 Parent(s): 727c1ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +392 -0
README.md CHANGED
@@ -26,6 +26,398 @@ model-index:
26
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
27
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Use with llama.cpp
30
  Install llama.cpp through brew (works on Mac and Linux)
31
 
 
26
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
27
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
28
 
29
+ ---
30
+ Model details:
31
+ -
32
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.
33
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
34
+
35
+ Version notes for 0.2: Now using the refined dataset from 32B 0.2. Major improvements in coherence, instruction following and long-context comprehension over 14B v0.1.
36
+
37
+ Prompt format is ChatML.
38
+
39
+ Recommended sampler values:
40
+
41
+ Temperature: 0.8
42
+ Min-P: 0.05
43
+ Top-A: 0.3
44
+ Repetition Penalty: 1.03
45
+
46
+ Recommended SillyTavern presets (via CalamitousFelicitousness):
47
+
48
+ Context
49
+ Instruct and System Prompt
50
+
51
+
52
+ Training data:
53
+
54
+ Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
55
+ Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
56
+ A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
57
+ A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
58
+ Synthstruct and SynthRP datasets by Epiculous
59
+ A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
60
+
61
+ Training time and hardware:
62
+
63
+ 3 hours on 8xH100 SXM, provided by FeatherlessAI
64
+
65
+
66
+ Model was created by Kearm, Auri and Cahvay.
67
+ Special thanks:
68
+
69
+ to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.
70
+ to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
71
+ to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data
72
+ and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
73
+
74
+ Built with Axolotl
75
+ See axolotl config
76
+
77
+ axolotl version: 0.4.1
78
+
79
+ base_model: Qwen/Qwen2.5-14B
80
+
81
+ load_in_8bit: false
82
+ load_in_4bit: false
83
+ strict: false
84
+
85
+ plugins:
86
+ - axolotl.integrations.liger.LigerPlugin
87
+ liger_rope: true
88
+ liger_rms_norm: true
89
+ liger_swiglu: true
90
+ liger_fused_linear_cross_entropy: true
91
+
92
+ # plugins:
93
+ # - axolotl.integrations.spectrum.SpectrumPlugin
94
+
95
+ # spectrum_top_fraction: 0.5
96
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
97
+ # spectrum_model_name: Qwen/Qwen2.5-32B
98
+
99
+ datasets:
100
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
101
+ type: sharegpt
102
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
103
+ type: sharegpt
104
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
105
+ type: sharegpt
106
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
107
+ type: sharegpt
108
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
109
+ type: sharegpt
110
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
111
+ type: sharegpt
112
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
113
+ type: sharegpt
114
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
115
+ type: sharegpt
116
+
117
+ chat_template: chatml
118
+ shuffle_merged_datasets: true
119
+ val_set_size: 0.001
120
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
121
+
122
+ sequence_len: 10240
123
+ sample_packing: true
124
+ eval_sample_packing: false
125
+ pad_to_sequence_len: true
126
+
127
+ # adapter: qlora
128
+ # lora_model_dir:
129
+ # lora_r: 64
130
+ # lora_alpha: 128
131
+ # lora_dropout: 0.05
132
+ # lora_target_linear: true
133
+ # peft_use_dora: true
134
+
135
+ base_model: Qwen/Qwen2.5-14B
136
+
137
+ load_in_8bit: false
138
+ load_in_4bit: false
139
+ strict: false
140
+
141
+ plugins:
142
+ - axolotl.integrations.liger.LigerPlugin
143
+ liger_rope: true
144
+ liger_rms_norm: true
145
+ liger_swiglu: true
146
+ liger_fused_linear_cross_entropy: true
147
+
148
+ datasets:
149
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
150
+ type: sharegpt
151
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
152
+ type: sharegpt
153
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
154
+ type: sharegpt
155
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
156
+ type: sharegpt
157
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
158
+ type: sharegpt
159
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
160
+ type: sharegpt
161
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
162
+ type: sharegpt
163
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
164
+ type: sharegpt
165
+
166
+ chat_template: chatml
167
+ shuffle_merged_datasets: true
168
+ val_set_size: 0.005
169
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
170
+
171
+ sequence_len: 10240
172
+ sample_packing: true
173
+ eval_sample_packing: false
174
+ pad_to_sequence_len: true
175
+
176
+ # adapter: qlora
177
+ # lora_model_dir:
178
+ # lora_r: 32
179
+ # lora_alpha: 16
180
+ # lora_dropout: 0.05
181
+ # lora_target_linear: true
182
+ # peft_use_dora: true
183
+
184
+ unfrozen_parameters:
185
+ - ^lm_head.weight$
186
+ - ^model.embed_tokens.weight$
187
+ # mlp.down_proj layers
188
+ - model.layers.1.mlp.down_proj
189
+ - model.layers.35.mlp.down_proj
190
+ - model.layers.38.mlp.down_proj
191
+ - model.layers.37.mlp.down_proj
192
+ - model.layers.36.mlp.down_proj
193
+ - model.layers.15.mlp.down_proj
194
+ - model.layers.11.mlp.down_proj
195
+ - model.layers.12.mlp.down_proj
196
+ - model.layers.34.mlp.down_proj
197
+ - model.layers.44.mlp.down_proj
198
+ - model.layers.45.mlp.down_proj
199
+ - model.layers.9.mlp.down_proj
200
+ - model.layers.41.mlp.down_proj
201
+ - model.layers.33.mlp.down_proj
202
+ - model.layers.43.mlp.down_proj
203
+ - model.layers.40.mlp.down_proj
204
+ - model.layers.13.mlp.down_proj
205
+ - model.layers.8.mlp.down_proj
206
+ - model.layers.39.mlp.down_proj
207
+ - model.layers.10.mlp.down_proj
208
+ - model.layers.14.mlp.down_proj
209
+ - model.layers.16.mlp.down_proj
210
+ - model.layers.31.mlp.down_proj
211
+ - model.layers.32.mlp.down_proj
212
+ # mlp.gate_proj layers
213
+ - model.layers.1.mlp.gate_proj
214
+ - model.layers.44.mlp.gate_proj
215
+ - model.layers.46.mlp.gate_proj
216
+ - model.layers.45.mlp.gate_proj
217
+ - model.layers.43.mlp.gate_proj
218
+ - model.layers.47.mlp.gate_proj
219
+ - model.layers.42.mlp.gate_proj
220
+ - model.layers.32.mlp.gate_proj
221
+ - model.layers.27.mlp.gate_proj
222
+ - model.layers.33.mlp.gate_proj
223
+ - model.layers.28.mlp.gate_proj
224
+ - model.layers.39.mlp.gate_proj
225
+ - model.layers.41.mlp.gate_proj
226
+ - model.layers.40.mlp.gate_proj
227
+ - model.layers.30.mlp.gate_proj
228
+ - model.layers.29.mlp.gate_proj
229
+ - model.layers.31.mlp.gate_proj
230
+ - model.layers.37.mlp.gate_proj
231
+ - model.layers.26.mlp.gate_proj
232
+ - model.layers.10.mlp.gate_proj
233
+ - model.layers.38.mlp.gate_proj
234
+ - model.layers.36.mlp.gate_proj
235
+ - model.layers.12.mlp.gate_proj
236
+ - model.layers.13.mlp.gate_proj
237
+ # mlp.up_proj layers
238
+ - model.layers.1.mlp.up_proj
239
+ - model.layers.13.mlp.up_proj
240
+ - model.layers.11.mlp.up_proj
241
+ - model.layers.14.mlp.up_proj
242
+ - model.layers.15.mlp.up_proj
243
+ - model.layers.12.mlp.up_proj
244
+ - model.layers.8.mlp.up_proj
245
+ - model.layers.16.mlp.up_proj
246
+ - model.layers.9.mlp.up_proj
247
+ - model.layers.19.mlp.up_proj
248
+ - model.layers.10.mlp.up_proj
249
+ - model.layers.7.mlp.up_proj
250
+ - model.layers.17.mlp.up_proj
251
+ - model.layers.20.mlp.up_proj
252
+ - model.layers.21.mlp.up_proj
253
+ - model.layers.18.mlp.up_proj
254
+ - model.layers.37.mlp.up_proj
255
+ - model.layers.38.mlp.up_proj
256
+ - model.layers.39.mlp.up_proj
257
+ - model.layers.42.mlp.up_proj
258
+ - model.layers.41.mlp.up_proj
259
+ - model.layers.27.mlp.up_proj
260
+ - model.layers.28.mlp.up_proj
261
+ - model.layers.36.mlp.up_proj
262
+ # self_attn.k_proj layers
263
+ - model.layers.47.self_attn.k_proj
264
+ - model.layers.39.self_attn.k_proj
265
+ - model.layers.41.self_attn.k_proj
266
+ - model.layers.37.self_attn.k_proj
267
+ - model.layers.35.self_attn.k_proj
268
+ - model.layers.44.self_attn.k_proj
269
+ - model.layers.38.self_attn.k_proj
270
+ - model.layers.14.self_attn.k_proj
271
+ - model.layers.7.self_attn.k_proj
272
+ - model.layers.12.self_attn.k_proj
273
+ - model.layers.11.self_attn.k_proj
274
+ - model.layers.32.self_attn.k_proj
275
+ - model.layers.10.self_attn.k_proj
276
+ - model.layers.8.self_attn.k_proj
277
+ - model.layers.6.self_attn.k_proj
278
+ - model.layers.9.self_attn.k_proj
279
+ - model.layers.45.self_attn.k_proj
280
+ - model.layers.42.self_attn.k_proj
281
+ - model.layers.40.self_attn.k_proj
282
+ - model.layers.5.self_attn.k_proj
283
+ - model.layers.0.self_attn.k_proj
284
+ - model.layers.33.self_attn.k_proj
285
+ - model.layers.34.self_attn.k_proj
286
+ - model.layers.13.self_attn.k_proj
287
+ # self_attn.o_proj layers
288
+ - model.layers.12.self_attn.o_proj
289
+ - model.layers.5.self_attn.o_proj
290
+ - model.layers.14.self_attn.o_proj
291
+ - model.layers.16.self_attn.o_proj
292
+ - model.layers.20.self_attn.o_proj
293
+ - model.layers.13.self_attn.o_proj
294
+ - model.layers.11.self_attn.o_proj
295
+ - model.layers.4.self_attn.o_proj
296
+ - model.layers.6.self_attn.o_proj
297
+ - model.layers.19.self_attn.o_proj
298
+ - model.layers.7.self_attn.o_proj
299
+ - model.layers.18.self_attn.o_proj
300
+ - model.layers.8.self_attn.o_proj
301
+ - model.layers.38.self_attn.o_proj
302
+ - model.layers.15.self_attn.o_proj
303
+ - model.layers.17.self_attn.o_proj
304
+ - model.layers.9.self_attn.o_proj
305
+ - model.layers.10.self_attn.o_proj
306
+ - model.layers.21.self_attn.o_proj
307
+ - model.layers.28.self_attn.o_proj
308
+ - model.layers.32.self_attn.o_proj
309
+ - model.layers.35.self_attn.o_proj
310
+ - model.layers.39.self_attn.o_proj
311
+ - model.layers.3.self_attn.o_proj
312
+ # self_attn.q_proj layers
313
+ - model.layers.1.self_attn.q_proj
314
+ - model.layers.2.self_attn.q_proj
315
+ - model.layers.3.self_attn.q_proj
316
+ - model.layers.44.self_attn.q_proj
317
+ - model.layers.29.self_attn.q_proj
318
+ - model.layers.45.self_attn.q_proj
319
+ - model.layers.43.self_attn.q_proj
320
+ - model.layers.32.self_attn.q_proj
321
+ - model.layers.38.self_attn.q_proj
322
+ - model.layers.19.self_attn.q_proj
323
+ - model.layers.42.self_attn.q_proj
324
+ - model.layers.34.self_attn.q_proj
325
+ - model.layers.36.self_attn.q_proj
326
+ - model.layers.40.self_attn.q_proj
327
+ - model.layers.26.self_attn.q_proj
328
+ - model.layers.20.self_attn.q_proj
329
+ - model.layers.28.self_attn.q_proj
330
+ - model.layers.39.self_attn.q_proj
331
+ - model.layers.41.self_attn.q_proj
332
+ - model.layers.33.self_attn.q_proj
333
+ - model.layers.35.self_attn.q_proj
334
+ - model.layers.25.self_attn.q_proj
335
+ - model.layers.30.self_attn.q_proj
336
+ - model.layers.27.self_attn.q_proj
337
+ # self_attn.v_proj layers
338
+ - model.layers.0.self_attn.v_proj
339
+ - model.layers.7.self_attn.v_proj
340
+ - model.layers.39.self_attn.v_proj
341
+ - model.layers.31.self_attn.v_proj
342
+ - model.layers.15.self_attn.v_proj
343
+ - model.layers.10.self_attn.v_proj
344
+ - model.layers.41.self_attn.v_proj
345
+ - model.layers.32.self_attn.v_proj
346
+ - model.layers.6.self_attn.v_proj
347
+ - model.layers.33.self_attn.v_proj
348
+ - model.layers.42.self_attn.v_proj
349
+ - model.layers.29.self_attn.v_proj
350
+ - model.layers.9.self_attn.v_proj
351
+ - model.layers.14.self_attn.v_proj
352
+ - model.layers.35.self_attn.v_proj
353
+ - model.layers.38.self_attn.v_proj
354
+ - model.layers.13.self_attn.v_proj
355
+ - model.layers.30.self_attn.v_proj
356
+ - model.layers.34.self_attn.v_proj
357
+ - model.layers.5.self_attn.v_proj
358
+ - model.layers.28.self_attn.v_proj
359
+ - model.layers.37.self_attn.v_proj
360
+ - model.layers.27.self_attn.v_proj
361
+ - model.layers.11.self_attn.v_proj
362
+
363
+ wandb_project: EVA-Qwen2.5-14B-SFFT-v0.2
364
+ wandb_entity:
365
+ wandb_watch:
366
+ wandb_name: Unit-02
367
+ wandb_log_model:
368
+
369
+ gradient_accumulation_steps: 8
370
+ micro_batch_size: 2
371
+ num_epochs: 3
372
+ optimizer: paged_ademamix_8bit
373
+ lr_scheduler: cosine
374
+ learning_rate: 0.00005
375
+ max_grad_norm: 3
376
+
377
+ train_on_inputs: false
378
+ group_by_length: false
379
+ bf16: auto
380
+ fp16:
381
+ tf32: false
382
+
383
+ gradient_checkpointing: "unsloth"
384
+ # gradient_checkpointing_kwargs:
385
+ # use_reentrant: true
386
+ early_stopping_patience:
387
+ resume_from_checkpoint:
388
+ local_rank:
389
+ logging_steps: 1
390
+ xformers_attention:
391
+ flash_attention: true
392
+
393
+ warmup_steps: 20
394
+ evals_per_epoch: 4
395
+ saves_per_epoch: 4
396
+ save_safetensors: true
397
+ hub_model_id:
398
+ hub_strategy:
399
+ debug:
400
+ deepspeed: deepspeed_configs/zero3_bf16.json
401
+ weight_decay: 0.1
402
+ # fsdp:
403
+ # - full_shard
404
+ # - auto_wrap
405
+ # fsdp_config:
406
+ # fsdp_limit_all_gathers: true
407
+ # fsdp_sync_module_states: false
408
+ # fsdp_offload_params: true
409
+ # fsdp_cpu_ram_efficient_loading: true
410
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
411
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
412
+ # fsdp_activation_checkpointing: true
413
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
414
+ # fsdp_sharding_strategy: FULL_SHARD
415
+ # fsdp_forward_prefetch: false # Added
416
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
417
+ # fsdp_backward_prefetch_limit: 1 # Added
418
+ # fsdp_mixed_precision: BF16 # Added
419
+
420
+ ---
421
  ## Use with llama.cpp
422
  Install llama.cpp through brew (works on Mac and Linux)
423