File size: 3,468 Bytes
fe11eec
350ed4e
 
 
 
fe11eec
1c43867
 
 
fe11eec
 
350ed4e
 
 
 
387a94f
350ed4e
 
 
 
 
 
 
 
 
 
5ce08c1
1c43867
350ed4e
1c43867
8b0f6fe
5ce08c1
f65603a
32ea1fa
 
 
350ed4e
1c43867
350ed4e
1c43867
370c8ff
350ed4e
e2f3929
350ed4e
0e81072
58fed7c
350ed4e
1c43867
0e81072
 
 
 
9be41b5
1c43867
350ed4e
0c2e721
5ce08c1
 
1c43867
350ed4e
5ce08c1
350ed4e
 
 
 
5ce08c1
1c43867
5ce08c1
1c43867
 
 
5ce08c1
 
1c43867
350ed4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d154987
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
datasets:
- jondurbin/gutenberg-dpo-v0.1
- Qwen/Qwen2.5-14B-Instruct
- HuggingFaceH4/ultrafeedback_binarized
base_model:
- Qwen/Qwen2.5-14B-Instruct
- v000000/Qwen2.5-14B-Gutenberg-1e-Delta
- tanliboy/lambda-qwen2.5-14b-dpo-test
library_name: transformers
tags:
- qwen
- qwen2.5
- finetune
- dpo
- orpo
- qwen2
- chat
- conversational
- instruct
- storywriting
- roleplay
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---

# Qwen2.5-Lumen-14B

* *Qwen direct preference optimization finetuned for ~3 epochs.*

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/wCcJkdrVDUH6m0AN9Lv3B.png)

<b>A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay.</b>

-------------------------------------------------------------------------------

## Training Notes

Trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 2 epochs on NVidia A100, and on dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), saving different checkpoints along the way.

[Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized](HuggingFaceH4/ultrafeedback_binarized), (Credit to Tanliboy! *Check out his model [here](https://huggingface.co/tanliboy/lambda-qwen2.5-14b-dpo-test)*)

*Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).*

## Merge

* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Ultrafeedback-Binarized DPO"* and *"Gutenberg DPO"*

* Merged with a sophosympatheia's <b>SLERP</b> gradient *"Qwen2.5-14B-Instruct"* and *"Gutenberg DPO"*

* Merged all <b>DPO checkpoints</b> and <b>SLERP</b> variations with <b>MODEL_STOCK</b> to analyze geometric properties and get the most performant aspects of all runs/merges. Model Stock was chosen due to the similarity between the merged models.

## Recipe

```yaml
models:
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
  - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
  - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
  - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
  - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta
  - model: tanliboy/lambda-qwen2.5-14b-dpo-test
  - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - model: tanliboy/lambda-qwen2.5-14b-dpo-test
  - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
  - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
merge_method: model_stock
dtype: bfloat16
```

### Finetune and merge

This is a merge and finetune of pre-trained language models.

###  Models Merged

[Arxiv 2403.19522](https://arxiv.org/abs/2403.19522)

The following models were included in the merge:
* v000000/Qwen2.5-14B-Gutenberg-1e-Delta
* v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential
* v000000/Qwen2.5-14B-Gutenberg-0.25e-Early
* v000000/Qwen2.5-14B-Gutenberg-2e-Sequential
* v000000/Qwen2.5-14B-Gutenberg-0.37e-Early
* v000000/Qwen2.5-14B-Gutenberg-2e-Zeta
* v000000/Qwen2.5-14B-Gutenberg-1e-Theta
* v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno
* v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
* tanliboy/lambda-qwen2.5-14b-dpo-test

- Context Length: Full 131,072 tokens and generation 8192 tokens