Triangle104 commited on
Commit
0c71d3f
1 Parent(s): 4457934

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +396 -2
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
- base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
 
 
5
  datasets:
6
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
7
  - Nopm/Opus_WritingStruct
@@ -26,6 +28,398 @@ model-index:
26
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
27
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Use with llama.cpp
30
  Install llama.cpp through brew (works on Mac and Linux)
31
 
@@ -64,4 +458,4 @@ Step 3: Run inference through the main binary.
64
  or
65
  ```
66
  ./llama-server --hf-repo Triangle104/EVA-Qwen2.5-14B-v0.2-Q8_0-GGUF --hf-file eva-qwen2.5-14b-v0.2-q8_0.gguf -c 2048
67
- ```
 
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
+ base_model:
5
+ - EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
6
+ - Qwen/Qwen2.5-14B
7
  datasets:
8
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
9
  - Nopm/Opus_WritingStruct
 
28
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
29
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
30
 
31
+ ---
32
+ Model details:
33
+ -
34
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.
35
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
36
+
37
+ Version notes for 0.2: Now using the refined dataset from 32B 0.2. Major improvements in coherence, instruction following and long-context comprehension over 14B v0.1.
38
+
39
+ Prompt format is ChatML.
40
+
41
+ Recommended sampler values:
42
+
43
+ Temperature: 0.8
44
+ Min-P: 0.05
45
+ Top-A: 0.3
46
+ Repetition Penalty: 1.03
47
+
48
+ Recommended SillyTavern presets (via CalamitousFelicitousness):
49
+
50
+ Context
51
+ Instruct and System Prompt
52
+
53
+
54
+ Training data:
55
+
56
+ Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
57
+ Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
58
+ A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
59
+ A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
60
+ Synthstruct and SynthRP datasets by Epiculous
61
+ A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
62
+
63
+ Training time and hardware:
64
+
65
+ 3 hours on 8xH100 SXM, provided by FeatherlessAI
66
+
67
+
68
+ Model was created by Kearm, Auri and Cahvay.
69
+ Special thanks:
70
+
71
+ to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.
72
+ to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
73
+ to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data
74
+ and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
75
+
76
+ Built with Axolotl
77
+ See axolotl config
78
+
79
+ axolotl version: 0.4.1
80
+
81
+ base_model: Qwen/Qwen2.5-14B
82
+
83
+ load_in_8bit: false
84
+ load_in_4bit: false
85
+ strict: false
86
+
87
+ plugins:
88
+ - axolotl.integrations.liger.LigerPlugin
89
+ liger_rope: true
90
+ liger_rms_norm: true
91
+ liger_swiglu: true
92
+ liger_fused_linear_cross_entropy: true
93
+
94
+ # plugins:
95
+ # - axolotl.integrations.spectrum.SpectrumPlugin
96
+
97
+ # spectrum_top_fraction: 0.5
98
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
99
+ # spectrum_model_name: Qwen/Qwen2.5-32B
100
+
101
+ datasets:
102
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
103
+ type: sharegpt
104
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
105
+ type: sharegpt
106
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
107
+ type: sharegpt
108
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
109
+ type: sharegpt
110
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
111
+ type: sharegpt
112
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
113
+ type: sharegpt
114
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
115
+ type: sharegpt
116
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
117
+ type: sharegpt
118
+
119
+ chat_template: chatml
120
+ shuffle_merged_datasets: true
121
+ val_set_size: 0.001
122
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
123
+
124
+ sequence_len: 10240
125
+ sample_packing: true
126
+ eval_sample_packing: false
127
+ pad_to_sequence_len: true
128
+
129
+ # adapter: qlora
130
+ # lora_model_dir:
131
+ # lora_r: 64
132
+ # lora_alpha: 128
133
+ # lora_dropout: 0.05
134
+ # lora_target_linear: true
135
+ # peft_use_dora: true
136
+
137
+ base_model: Qwen/Qwen2.5-14B
138
+
139
+ load_in_8bit: false
140
+ load_in_4bit: false
141
+ strict: false
142
+
143
+ plugins:
144
+ - axolotl.integrations.liger.LigerPlugin
145
+ liger_rope: true
146
+ liger_rms_norm: true
147
+ liger_swiglu: true
148
+ liger_fused_linear_cross_entropy: true
149
+
150
+ datasets:
151
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
152
+ type: sharegpt
153
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
154
+ type: sharegpt
155
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
156
+ type: sharegpt
157
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
158
+ type: sharegpt
159
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
160
+ type: sharegpt
161
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
162
+ type: sharegpt
163
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
164
+ type: sharegpt
165
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
166
+ type: sharegpt
167
+
168
+ chat_template: chatml
169
+ shuffle_merged_datasets: true
170
+ val_set_size: 0.005
171
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
172
+
173
+ sequence_len: 10240
174
+ sample_packing: true
175
+ eval_sample_packing: false
176
+ pad_to_sequence_len: true
177
+
178
+ # adapter: qlora
179
+ # lora_model_dir:
180
+ # lora_r: 32
181
+ # lora_alpha: 16
182
+ # lora_dropout: 0.05
183
+ # lora_target_linear: true
184
+ # peft_use_dora: true
185
+
186
+ unfrozen_parameters:
187
+ - ^lm_head.weight$
188
+ - ^model.embed_tokens.weight$
189
+ # mlp.down_proj layers
190
+ - model.layers.1.mlp.down_proj
191
+ - model.layers.35.mlp.down_proj
192
+ - model.layers.38.mlp.down_proj
193
+ - model.layers.37.mlp.down_proj
194
+ - model.layers.36.mlp.down_proj
195
+ - model.layers.15.mlp.down_proj
196
+ - model.layers.11.mlp.down_proj
197
+ - model.layers.12.mlp.down_proj
198
+ - model.layers.34.mlp.down_proj
199
+ - model.layers.44.mlp.down_proj
200
+ - model.layers.45.mlp.down_proj
201
+ - model.layers.9.mlp.down_proj
202
+ - model.layers.41.mlp.down_proj
203
+ - model.layers.33.mlp.down_proj
204
+ - model.layers.43.mlp.down_proj
205
+ - model.layers.40.mlp.down_proj
206
+ - model.layers.13.mlp.down_proj
207
+ - model.layers.8.mlp.down_proj
208
+ - model.layers.39.mlp.down_proj
209
+ - model.layers.10.mlp.down_proj
210
+ - model.layers.14.mlp.down_proj
211
+ - model.layers.16.mlp.down_proj
212
+ - model.layers.31.mlp.down_proj
213
+ - model.layers.32.mlp.down_proj
214
+ # mlp.gate_proj layers
215
+ - model.layers.1.mlp.gate_proj
216
+ - model.layers.44.mlp.gate_proj
217
+ - model.layers.46.mlp.gate_proj
218
+ - model.layers.45.mlp.gate_proj
219
+ - model.layers.43.mlp.gate_proj
220
+ - model.layers.47.mlp.gate_proj
221
+ - model.layers.42.mlp.gate_proj
222
+ - model.layers.32.mlp.gate_proj
223
+ - model.layers.27.mlp.gate_proj
224
+ - model.layers.33.mlp.gate_proj
225
+ - model.layers.28.mlp.gate_proj
226
+ - model.layers.39.mlp.gate_proj
227
+ - model.layers.41.mlp.gate_proj
228
+ - model.layers.40.mlp.gate_proj
229
+ - model.layers.30.mlp.gate_proj
230
+ - model.layers.29.mlp.gate_proj
231
+ - model.layers.31.mlp.gate_proj
232
+ - model.layers.37.mlp.gate_proj
233
+ - model.layers.26.mlp.gate_proj
234
+ - model.layers.10.mlp.gate_proj
235
+ - model.layers.38.mlp.gate_proj
236
+ - model.layers.36.mlp.gate_proj
237
+ - model.layers.12.mlp.gate_proj
238
+ - model.layers.13.mlp.gate_proj
239
+ # mlp.up_proj layers
240
+ - model.layers.1.mlp.up_proj
241
+ - model.layers.13.mlp.up_proj
242
+ - model.layers.11.mlp.up_proj
243
+ - model.layers.14.mlp.up_proj
244
+ - model.layers.15.mlp.up_proj
245
+ - model.layers.12.mlp.up_proj
246
+ - model.layers.8.mlp.up_proj
247
+ - model.layers.16.mlp.up_proj
248
+ - model.layers.9.mlp.up_proj
249
+ - model.layers.19.mlp.up_proj
250
+ - model.layers.10.mlp.up_proj
251
+ - model.layers.7.mlp.up_proj
252
+ - model.layers.17.mlp.up_proj
253
+ - model.layers.20.mlp.up_proj
254
+ - model.layers.21.mlp.up_proj
255
+ - model.layers.18.mlp.up_proj
256
+ - model.layers.37.mlp.up_proj
257
+ - model.layers.38.mlp.up_proj
258
+ - model.layers.39.mlp.up_proj
259
+ - model.layers.42.mlp.up_proj
260
+ - model.layers.41.mlp.up_proj
261
+ - model.layers.27.mlp.up_proj
262
+ - model.layers.28.mlp.up_proj
263
+ - model.layers.36.mlp.up_proj
264
+ # self_attn.k_proj layers
265
+ - model.layers.47.self_attn.k_proj
266
+ - model.layers.39.self_attn.k_proj
267
+ - model.layers.41.self_attn.k_proj
268
+ - model.layers.37.self_attn.k_proj
269
+ - model.layers.35.self_attn.k_proj
270
+ - model.layers.44.self_attn.k_proj
271
+ - model.layers.38.self_attn.k_proj
272
+ - model.layers.14.self_attn.k_proj
273
+ - model.layers.7.self_attn.k_proj
274
+ - model.layers.12.self_attn.k_proj
275
+ - model.layers.11.self_attn.k_proj
276
+ - model.layers.32.self_attn.k_proj
277
+ - model.layers.10.self_attn.k_proj
278
+ - model.layers.8.self_attn.k_proj
279
+ - model.layers.6.self_attn.k_proj
280
+ - model.layers.9.self_attn.k_proj
281
+ - model.layers.45.self_attn.k_proj
282
+ - model.layers.42.self_attn.k_proj
283
+ - model.layers.40.self_attn.k_proj
284
+ - model.layers.5.self_attn.k_proj
285
+ - model.layers.0.self_attn.k_proj
286
+ - model.layers.33.self_attn.k_proj
287
+ - model.layers.34.self_attn.k_proj
288
+ - model.layers.13.self_attn.k_proj
289
+ # self_attn.o_proj layers
290
+ - model.layers.12.self_attn.o_proj
291
+ - model.layers.5.self_attn.o_proj
292
+ - model.layers.14.self_attn.o_proj
293
+ - model.layers.16.self_attn.o_proj
294
+ - model.layers.20.self_attn.o_proj
295
+ - model.layers.13.self_attn.o_proj
296
+ - model.layers.11.self_attn.o_proj
297
+ - model.layers.4.self_attn.o_proj
298
+ - model.layers.6.self_attn.o_proj
299
+ - model.layers.19.self_attn.o_proj
300
+ - model.layers.7.self_attn.o_proj
301
+ - model.layers.18.self_attn.o_proj
302
+ - model.layers.8.self_attn.o_proj
303
+ - model.layers.38.self_attn.o_proj
304
+ - model.layers.15.self_attn.o_proj
305
+ - model.layers.17.self_attn.o_proj
306
+ - model.layers.9.self_attn.o_proj
307
+ - model.layers.10.self_attn.o_proj
308
+ - model.layers.21.self_attn.o_proj
309
+ - model.layers.28.self_attn.o_proj
310
+ - model.layers.32.self_attn.o_proj
311
+ - model.layers.35.self_attn.o_proj
312
+ - model.layers.39.self_attn.o_proj
313
+ - model.layers.3.self_attn.o_proj
314
+ # self_attn.q_proj layers
315
+ - model.layers.1.self_attn.q_proj
316
+ - model.layers.2.self_attn.q_proj
317
+ - model.layers.3.self_attn.q_proj
318
+ - model.layers.44.self_attn.q_proj
319
+ - model.layers.29.self_attn.q_proj
320
+ - model.layers.45.self_attn.q_proj
321
+ - model.layers.43.self_attn.q_proj
322
+ - model.layers.32.self_attn.q_proj
323
+ - model.layers.38.self_attn.q_proj
324
+ - model.layers.19.self_attn.q_proj
325
+ - model.layers.42.self_attn.q_proj
326
+ - model.layers.34.self_attn.q_proj
327
+ - model.layers.36.self_attn.q_proj
328
+ - model.layers.40.self_attn.q_proj
329
+ - model.layers.26.self_attn.q_proj
330
+ - model.layers.20.self_attn.q_proj
331
+ - model.layers.28.self_attn.q_proj
332
+ - model.layers.39.self_attn.q_proj
333
+ - model.layers.41.self_attn.q_proj
334
+ - model.layers.33.self_attn.q_proj
335
+ - model.layers.35.self_attn.q_proj
336
+ - model.layers.25.self_attn.q_proj
337
+ - model.layers.30.self_attn.q_proj
338
+ - model.layers.27.self_attn.q_proj
339
+ # self_attn.v_proj layers
340
+ - model.layers.0.self_attn.v_proj
341
+ - model.layers.7.self_attn.v_proj
342
+ - model.layers.39.self_attn.v_proj
343
+ - model.layers.31.self_attn.v_proj
344
+ - model.layers.15.self_attn.v_proj
345
+ - model.layers.10.self_attn.v_proj
346
+ - model.layers.41.self_attn.v_proj
347
+ - model.layers.32.self_attn.v_proj
348
+ - model.layers.6.self_attn.v_proj
349
+ - model.layers.33.self_attn.v_proj
350
+ - model.layers.42.self_attn.v_proj
351
+ - model.layers.29.self_attn.v_proj
352
+ - model.layers.9.self_attn.v_proj
353
+ - model.layers.14.self_attn.v_proj
354
+ - model.layers.35.self_attn.v_proj
355
+ - model.layers.38.self_attn.v_proj
356
+ - model.layers.13.self_attn.v_proj
357
+ - model.layers.30.self_attn.v_proj
358
+ - model.layers.34.self_attn.v_proj
359
+ - model.layers.5.self_attn.v_proj
360
+ - model.layers.28.self_attn.v_proj
361
+ - model.layers.37.self_attn.v_proj
362
+ - model.layers.27.self_attn.v_proj
363
+ - model.layers.11.self_attn.v_proj
364
+
365
+ wandb_project: EVA-Qwen2.5-14B-SFFT-v0.2
366
+ wandb_entity:
367
+ wandb_watch:
368
+ wandb_name: Unit-02
369
+ wandb_log_model:
370
+
371
+ gradient_accumulation_steps: 8
372
+ micro_batch_size: 2
373
+ num_epochs: 3
374
+ optimizer: paged_ademamix_8bit
375
+ lr_scheduler: cosine
376
+ learning_rate: 0.00005
377
+ max_grad_norm: 3
378
+
379
+ train_on_inputs: false
380
+ group_by_length: false
381
+ bf16: auto
382
+ fp16:
383
+ tf32: false
384
+
385
+ gradient_checkpointing: "unsloth"
386
+ # gradient_checkpointing_kwargs:
387
+ # use_reentrant: true
388
+ early_stopping_patience:
389
+ resume_from_checkpoint:
390
+ local_rank:
391
+ logging_steps: 1
392
+ xformers_attention:
393
+ flash_attention: true
394
+
395
+ warmup_steps: 20
396
+ evals_per_epoch: 4
397
+ saves_per_epoch: 4
398
+ save_safetensors: true
399
+ hub_model_id:
400
+ hub_strategy:
401
+ debug:
402
+ deepspeed: deepspeed_configs/zero3_bf16.json
403
+ weight_decay: 0.1
404
+ # fsdp:
405
+ # - full_shard
406
+ # - auto_wrap
407
+ # fsdp_config:
408
+ # fsdp_limit_all_gathers: true
409
+ # fsdp_sync_module_states: false
410
+ # fsdp_offload_params: true
411
+ # fsdp_cpu_ram_efficient_loading: true
412
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
413
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
414
+ # fsdp_activation_checkpointing: true
415
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
416
+ # fsdp_sharding_strategy: FULL_SHARD
417
+ # fsdp_forward_prefetch: false # Added
418
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
419
+ # fsdp_backward_prefetch_limit: 1 # Added
420
+ # fsdp_mixed_precision: BF16 # Added
421
+
422
+ ---
423
  ## Use with llama.cpp
424
  Install llama.cpp through brew (works on Mac and Linux)
425
 
 
458
  or
459
  ```
460
  ./llama-server --hf-repo Triangle104/EVA-Qwen2.5-14B-v0.2-Q8_0-GGUF --hf-file eva-qwen2.5-14b-v0.2-q8_0.gguf -c 2048
461
+ ```