AuriAetherwiing commited on
Commit
a7c231b
1 Parent(s): 936bc4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -66
README.md CHANGED
@@ -2,15 +2,72 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-32B
 
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
8
  - name: EVA-Qwen2.5-32B-SFFT-v0.1
9
  results: []
10
  ---
11
-
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
  <details><summary>See axolotl config</summary>
@@ -368,66 +425,4 @@ weight_decay: 0.1
368
  # fsdp_mixed_precision: BF16 # Added
369
  ```
370
 
371
- </details><br>
372
-
373
- # EVA-Qwen2.5-32B-SFFT-v0.1
374
-
375
- This model is a fine-tuned version of [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) on the None dataset.
376
- It achieves the following results on the evaluation set:
377
- - Loss: 1.0347
378
-
379
- ## Model description
380
-
381
- More information needed
382
-
383
- ## Intended uses & limitations
384
-
385
- More information needed
386
-
387
- ## Training and evaluation data
388
-
389
- More information needed
390
-
391
- ## Training procedure
392
-
393
- ### Training hyperparameters
394
-
395
- The following hyperparameters were used during training:
396
- - learning_rate: 5e-05
397
- - train_batch_size: 1
398
- - eval_batch_size: 1
399
- - seed: 42
400
- - distributed_type: multi-GPU
401
- - num_devices: 8
402
- - gradient_accumulation_steps: 8
403
- - total_train_batch_size: 64
404
- - total_eval_batch_size: 8
405
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
406
- - lr_scheduler_type: cosine
407
- - lr_scheduler_warmup_steps: 20
408
- - num_epochs: 3
409
-
410
- ### Training results
411
-
412
- | Training Loss | Epoch | Step | Validation Loss |
413
- |:-------------:|:------:|:----:|:---------------:|
414
- | 1.4186 | 0.0068 | 1 | 1.4219 |
415
- | 1.1535 | 0.2532 | 37 | 1.0748 |
416
- | 1.0027 | 0.5064 | 74 | 1.0227 |
417
- | 0.9742 | 0.7596 | 111 | 1.0011 |
418
- | 0.7284 | 1.0120 | 148 | 1.0106 |
419
- | 0.6779 | 1.2666 | 185 | 1.0126 |
420
- | 0.6424 | 1.5211 | 222 | 1.0009 |
421
- | 0.6818 | 1.7756 | 259 | 0.9927 |
422
- | 0.4779 | 2.0233 | 296 | 1.0260 |
423
- | 0.4423 | 2.2782 | 333 | 1.0372 |
424
- | 0.4613 | 2.5332 | 370 | 1.0365 |
425
- | 0.4332 | 2.7881 | 407 | 1.0347 |
426
-
427
-
428
- ### Framework versions
429
-
430
- - Transformers 4.45.2
431
- - Pytorch 2.4.0+cu121
432
- - Datasets 3.0.1
433
- - Tokenizers 0.20.1
 
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-32B
5
+ datasets:
6
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
7
+ - Nopm/Opus_WritingStruct
8
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
9
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
10
+ - Gryphe/ChatGPT-4o-Writing-Prompts
11
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
12
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
13
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
14
+ - allura-org/Celeste-1.x-data-mixture
15
+ - cognitivecomputations/dolphin-2.9.3
16
  tags:
17
  - generated_from_trainer
18
  model-index:
19
  - name: EVA-Qwen2.5-32B-SFFT-v0.1
20
  results: []
21
  ---
22
+ # EVA Qwen2.5-32B v0.1
23
+
24
+ <p>
25
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.<br>
26
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
27
+ </p>
28
+
29
+ <p>Version notes for 0.1: Additional round of cleaning for the datasets, new subsets of 4o-WritingPrompts and Charcards, picking the most diverse samples from them, plus added a small subset of SystemChat2.0 to improve instruction following and sliglthy increased sequence length. Additionally, fixed the training config mistake from 32B 0.0, layernorm layers stay frozen this time. Unfreezing them caused positivity bias to appear in 32B 0.0 for some reason.</p>
30
+
31
+ <p>
32
+ <p>Prompt format is ChatML.</p><br>
33
+ <h3>Recommended sampler values:</h3>
34
+ <ul>
35
+ <li>Temperature: 1</li>
36
+ <li>Typical-P: 0.9</li>
37
+ <li>Min-P: 0.05</li>
38
+ <li>Top-A: 0.2</li>
39
+ <li>Repetition Penalty: 1.03</li>
40
+ </ul>
41
+
42
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
43
+
44
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
45
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
46
+ </p>
47
+
48
+ <p>
49
+ <br>
50
+ <h3>
51
+ Training data:
52
+ </h3>
53
+ <ul>
54
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
55
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
56
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
57
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
58
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
59
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
60
+ </ul>
61
+ <h3>
62
+ Training time and hardware:
63
+ </h3>
64
+ <ul><li>7 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
65
+ </p>
66
+ <p>Model was trained by Kearm and Auri.</p>
67
+ <h4>Special thanks:</h4><ul>
68
+ <li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
69
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data</li>
70
+ <li>and to Allura-org for support and feedback on EVA models.</li></ul>
71
 
72
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
73
  <details><summary>See axolotl config</summary>
 
425
  # fsdp_mixed_precision: BF16 # Added
426
  ```
427
 
428
+ </details>