AuriAetherwiing commited on
Commit
991862d
1 Parent(s): c237fc9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -65
README.md CHANGED
@@ -2,6 +2,17 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-14B
 
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -9,8 +20,57 @@ model-index:
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
  <details><summary>See axolotl config</summary>
@@ -359,66 +419,4 @@ weight_decay: 0.1
359
  # fsdp_mixed_precision: BF16 # Added
360
  ```
361
 
362
- </details><br>
363
-
364
- # EVA-Qwen2.5-14B-SFFT-v0.2
365
-
366
- This model is a fine-tuned version of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on the None dataset.
367
- It achieves the following results on the evaluation set:
368
- - Loss: 3.0986
369
-
370
- ## Model description
371
-
372
- More information needed
373
-
374
- ## Intended uses & limitations
375
-
376
- More information needed
377
-
378
- ## Training and evaluation data
379
-
380
- More information needed
381
-
382
- ## Training procedure
383
-
384
- ### Training hyperparameters
385
-
386
- The following hyperparameters were used during training:
387
- - learning_rate: 5e-05
388
- - train_batch_size: 2
389
- - eval_batch_size: 2
390
- - seed: 42
391
- - distributed_type: multi-GPU
392
- - num_devices: 8
393
- - gradient_accumulation_steps: 8
394
- - total_train_batch_size: 128
395
- - total_eval_batch_size: 16
396
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
397
- - lr_scheduler_type: cosine
398
- - lr_scheduler_warmup_steps: 20
399
- - num_epochs: 3
400
-
401
- ### Training results
402
-
403
- | Training Loss | Epoch | Step | Validation Loss |
404
- |:-------------:|:------:|:----:|:---------------:|
405
- | 1.4236 | 0.0170 | 1 | 2.6557 |
406
- | 1.2513 | 0.2553 | 15 | 3.4606 |
407
- | 1.1338 | 0.5106 | 30 | 3.5536 |
408
- | 1.0985 | 0.7660 | 45 | 3.1957 |
409
- | 0.8794 | 1.0170 | 60 | 3.0346 |
410
- | 0.8584 | 1.2718 | 75 | 3.0551 |
411
- | 0.8421 | 1.5265 | 90 | 3.0168 |
412
- | 0.8081 | 1.7813 | 105 | 3.0335 |
413
- | 0.8227 | 2.0361 | 120 | 3.0369 |
414
- | 0.7416 | 2.2909 | 135 | 3.0876 |
415
- | 0.7396 | 2.5456 | 150 | 3.1023 |
416
- | 0.7775 | 2.8004 | 165 | 3.0986 |
417
-
418
-
419
- ### Framework versions
420
-
421
- - Transformers 4.45.1
422
- - Pytorch 2.4.0+cu121
423
- - Datasets 2.21.0
424
- - Tokenizers 0.20.2
 
2
  library_name: transformers
3
  license: apache-2.0
4
  base_model: Qwen/Qwen2.5-14B
5
+ datasets:
6
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
7
+ - Nopm/Opus_WritingStruct
8
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
9
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
10
+ - Gryphe/ChatGPT-4o-Writing-Prompts
11
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
12
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
13
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
14
+ - allura-org/Celeste-1.x-data-mixture
15
+ - cognitivecomputations/dolphin-2.9.3
16
  tags:
17
  - generated_from_trainer
18
  model-index:
 
20
  results: []
21
  ---
22
 
23
+
24
+ # EVA Qwen2.5-14B v0.2
25
+
26
+ <p>
27
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.<br>
28
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
29
+ </p>
30
+
31
+ <p><b>Version notes for 0.2</b>: Now using the refined dataset from 32B 0.2. Major improvements in coherence, instruction following and long-context comprehension over 14B v0.1.</p>
32
+
33
+ <p>
34
+ <p>Prompt format is ChatML.</p><br>
35
+ <h3>Recommended sampler values:</h3>
36
+ <ul>
37
+ <li>Temperature: 0.8</li>
38
+ <li>Min-P: 0.05</li>
39
+ <li>Top-A: 0.3</li>
40
+ <li>Repetition Penalty: 1.03</li>
41
+ </ul>
42
+
43
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
44
+
45
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
46
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
47
+ </p>
48
+
49
+ <p>
50
+ <br>
51
+ <h3>
52
+ Training data:
53
+ </h3>
54
+ <ul>
55
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
56
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
57
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
58
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
59
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
60
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
61
+ </ul>
62
+ <h3>
63
+ Training time and hardware:
64
+ </h3>
65
+ <ul><li> 3 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
66
+ </p>
67
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
68
+ <h4>Special thanks:</h4><ul>
69
+ <li><b>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</b></li>
70
+ <li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
71
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data</li>
72
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
73
+
74
 
75
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
76
  <details><summary>See axolotl config</summary>
 
419
  # fsdp_mixed_precision: BF16 # Added
420
  ```
421
 
422
+ </details><br>