AuriAetherwiing
commited on
Commit
•
a7c231b
1
Parent(s):
936bc4f
Update README.md
Browse files
README.md
CHANGED
@@ -2,15 +2,72 @@
|
|
2 |
library_name: transformers
|
3 |
license: apache-2.0
|
4 |
base_model: Qwen/Qwen2.5-32B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
model-index:
|
8 |
- name: EVA-Qwen2.5-32B-SFFT-v0.1
|
9 |
results: []
|
10 |
---
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
16 |
<details><summary>See axolotl config</summary>
|
@@ -368,66 +425,4 @@ weight_decay: 0.1
|
|
368 |
# fsdp_mixed_precision: BF16 # Added
|
369 |
```
|
370 |
|
371 |
-
</details
|
372 |
-
|
373 |
-
# EVA-Qwen2.5-32B-SFFT-v0.1
|
374 |
-
|
375 |
-
This model is a fine-tuned version of [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) on the None dataset.
|
376 |
-
It achieves the following results on the evaluation set:
|
377 |
-
- Loss: 1.0347
|
378 |
-
|
379 |
-
## Model description
|
380 |
-
|
381 |
-
More information needed
|
382 |
-
|
383 |
-
## Intended uses & limitations
|
384 |
-
|
385 |
-
More information needed
|
386 |
-
|
387 |
-
## Training and evaluation data
|
388 |
-
|
389 |
-
More information needed
|
390 |
-
|
391 |
-
## Training procedure
|
392 |
-
|
393 |
-
### Training hyperparameters
|
394 |
-
|
395 |
-
The following hyperparameters were used during training:
|
396 |
-
- learning_rate: 5e-05
|
397 |
-
- train_batch_size: 1
|
398 |
-
- eval_batch_size: 1
|
399 |
-
- seed: 42
|
400 |
-
- distributed_type: multi-GPU
|
401 |
-
- num_devices: 8
|
402 |
-
- gradient_accumulation_steps: 8
|
403 |
-
- total_train_batch_size: 64
|
404 |
-
- total_eval_batch_size: 8
|
405 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
406 |
-
- lr_scheduler_type: cosine
|
407 |
-
- lr_scheduler_warmup_steps: 20
|
408 |
-
- num_epochs: 3
|
409 |
-
|
410 |
-
### Training results
|
411 |
-
|
412 |
-
| Training Loss | Epoch | Step | Validation Loss |
|
413 |
-
|:-------------:|:------:|:----:|:---------------:|
|
414 |
-
| 1.4186 | 0.0068 | 1 | 1.4219 |
|
415 |
-
| 1.1535 | 0.2532 | 37 | 1.0748 |
|
416 |
-
| 1.0027 | 0.5064 | 74 | 1.0227 |
|
417 |
-
| 0.9742 | 0.7596 | 111 | 1.0011 |
|
418 |
-
| 0.7284 | 1.0120 | 148 | 1.0106 |
|
419 |
-
| 0.6779 | 1.2666 | 185 | 1.0126 |
|
420 |
-
| 0.6424 | 1.5211 | 222 | 1.0009 |
|
421 |
-
| 0.6818 | 1.7756 | 259 | 0.9927 |
|
422 |
-
| 0.4779 | 2.0233 | 296 | 1.0260 |
|
423 |
-
| 0.4423 | 2.2782 | 333 | 1.0372 |
|
424 |
-
| 0.4613 | 2.5332 | 370 | 1.0365 |
|
425 |
-
| 0.4332 | 2.7881 | 407 | 1.0347 |
|
426 |
-
|
427 |
-
|
428 |
-
### Framework versions
|
429 |
-
|
430 |
-
- Transformers 4.45.2
|
431 |
-
- Pytorch 2.4.0+cu121
|
432 |
-
- Datasets 3.0.1
|
433 |
-
- Tokenizers 0.20.1
|
|
|
2 |
library_name: transformers
|
3 |
license: apache-2.0
|
4 |
base_model: Qwen/Qwen2.5-32B
|
5 |
+
datasets:
|
6 |
+
- anthracite-org/kalo-opus-instruct-22k-no-refusal
|
7 |
+
- Nopm/Opus_WritingStruct
|
8 |
+
- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
|
9 |
+
- Gryphe/Sonnet3.5-Charcard-Roleplay
|
10 |
+
- Gryphe/ChatGPT-4o-Writing-Prompts
|
11 |
+
- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
|
12 |
+
- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
|
13 |
+
- nothingiisreal/Reddit-Dirty-And-WritingPrompts
|
14 |
+
- allura-org/Celeste-1.x-data-mixture
|
15 |
+
- cognitivecomputations/dolphin-2.9.3
|
16 |
tags:
|
17 |
- generated_from_trainer
|
18 |
model-index:
|
19 |
- name: EVA-Qwen2.5-32B-SFFT-v0.1
|
20 |
results: []
|
21 |
---
|
22 |
+
# EVA Qwen2.5-32B v0.1
|
23 |
+
|
24 |
+
<p>
|
25 |
+
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.<br>
|
26 |
+
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
|
27 |
+
</p>
|
28 |
+
|
29 |
+
<p>Version notes for 0.1: Additional round of cleaning for the datasets, new subsets of 4o-WritingPrompts and Charcards, picking the most diverse samples from them, plus added a small subset of SystemChat2.0 to improve instruction following and sliglthy increased sequence length. Additionally, fixed the training config mistake from 32B 0.0, layernorm layers stay frozen this time. Unfreezing them caused positivity bias to appear in 32B 0.0 for some reason.</p>
|
30 |
+
|
31 |
+
<p>
|
32 |
+
<p>Prompt format is ChatML.</p><br>
|
33 |
+
<h3>Recommended sampler values:</h3>
|
34 |
+
<ul>
|
35 |
+
<li>Temperature: 1</li>
|
36 |
+
<li>Typical-P: 0.9</li>
|
37 |
+
<li>Min-P: 0.05</li>
|
38 |
+
<li>Top-A: 0.2</li>
|
39 |
+
<li>Repetition Penalty: 1.03</li>
|
40 |
+
</ul>
|
41 |
+
|
42 |
+
<h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
|
43 |
+
|
44 |
+
- [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
|
45 |
+
- [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
|
46 |
+
</p>
|
47 |
+
|
48 |
+
<p>
|
49 |
+
<br>
|
50 |
+
<h3>
|
51 |
+
Training data:
|
52 |
+
</h3>
|
53 |
+
<ul>
|
54 |
+
<li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
|
55 |
+
<li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
|
56 |
+
<li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
|
57 |
+
<li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
|
58 |
+
<li>Synthstruct and SynthRP datasets by Epiculous</li>
|
59 |
+
<li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
|
60 |
+
</ul>
|
61 |
+
<h3>
|
62 |
+
Training time and hardware:
|
63 |
+
</h3>
|
64 |
+
<ul><li>7 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
|
65 |
+
</p>
|
66 |
+
<p>Model was trained by Kearm and Auri.</p>
|
67 |
+
<h4>Special thanks:</h4><ul>
|
68 |
+
<li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
|
69 |
+
<li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data</li>
|
70 |
+
<li>and to Allura-org for support and feedback on EVA models.</li></ul>
|
71 |
|
72 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
73 |
<details><summary>See axolotl config</summary>
|
|
|
425 |
# fsdp_mixed_precision: BF16 # Added
|
426 |
```
|
427 |
|
428 |
+
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|