Add text-generation tag and link to code repository
#1
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,26 +1,24 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: apache-2.0
|
4 |
-
datasets:
|
5 |
-
- DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
|
6 |
base_model:
|
7 |
- Qwen/Qwen2.5-7B-Instruct
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
# LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
|
11 |
|
12 |
This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
|
13 |
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
<h5 align="left">
|
18 |
|
19 |
[](http://arxiv.org/abs/2502.13922)
|
20 |
[](https://huggingface.co/papers/2502.13922)
|
21 |
</h5>
|
22 |
|
23 |
-
|
24 |
|
25 |
## Highlights of LongPO
|
26 |
|
@@ -28,10 +26,8 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
|
|
28 |
- Extending context length while keeping aligned in one stage.
|
29 |
- No degradation on short-context capabilities.
|
30 |
|
31 |
-
|
32 |
<img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
|
33 |
|
34 |
-
|
35 |
## Models and Training Data
|
36 |
|
37 |
| Models | Base Model | Training Data | # Data Samples |
|
@@ -43,10 +39,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
|
|
43 |
|
44 |
\* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
## Training Process:
|
51 |
|
52 |
1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
|
@@ -99,11 +91,8 @@ train/train_longpo.py \
|
|
99 |
|
100 |
## Evaluation
|
101 |
|
102 |
-
|
103 |
-
|
104 |
### InfiniteBench
|
105 |
|
106 |
-
|
107 |
| Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
|
108 |
| ---------------- | -------------------- | ------ | ------ | ------ | ------ |
|
109 |
| GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
|
@@ -126,10 +115,6 @@ train/train_longpo.py \
|
|
126 |
- Our results are evaluated with greedy decoding.
|
127 |
- Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
|
128 |
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
### RULER
|
134 |
|
135 |
| Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
|
@@ -141,10 +126,6 @@ train/train_longpo.py \
|
|
141 |
| Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
|
142 |
| Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
|
143 |
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
### Short Context
|
149 |
|
150 |
| Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
|
@@ -156,8 +137,6 @@ train/train_longpo.py \
|
|
156 |
| Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
|
157 |
| Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
|
158 |
|
159 |
-
|
160 |
-
|
161 |
## Citation
|
162 |
If you find our project useful, hope you can star our repo and cite our paper as follows:
|
163 |
```
|
@@ -169,6 +148,4 @@ If you find our project useful, hope you can star our repo and cite our paper as
|
|
169 |
year={2025},
|
170 |
url={https://openreview.net/forum?id=qTrEq31Shm}
|
171 |
}
|
172 |
-
```
|
173 |
-
|
174 |
-
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- Qwen/Qwen2.5-7B-Instruct
|
4 |
+
datasets:
|
5 |
+
- DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
|
6 |
+
library_name: transformers
|
7 |
+
license: apache-2.0
|
8 |
+
pipeline_tag: text-generation
|
9 |
---
|
10 |
|
11 |
# LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
|
12 |
|
13 |
This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
|
14 |
|
|
|
|
|
|
|
15 |
<h5 align="left">
|
16 |
|
17 |
[](http://arxiv.org/abs/2502.13922)
|
18 |
[](https://huggingface.co/papers/2502.13922)
|
19 |
</h5>
|
20 |
|
21 |
+
[Code]: https://github.com/DAMO-NLP-SG/LongPO
|
22 |
|
23 |
## Highlights of LongPO
|
24 |
|
|
|
26 |
- Extending context length while keeping aligned in one stage.
|
27 |
- No degradation on short-context capabilities.
|
28 |
|
|
|
29 |
<img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
|
30 |
|
|
|
31 |
## Models and Training Data
|
32 |
|
33 |
| Models | Base Model | Training Data | # Data Samples |
|
|
|
39 |
|
40 |
\* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
|
41 |
|
|
|
|
|
|
|
|
|
42 |
## Training Process:
|
43 |
|
44 |
1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
|
|
|
91 |
|
92 |
## Evaluation
|
93 |
|
|
|
|
|
94 |
### InfiniteBench
|
95 |
|
|
|
96 |
| Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
|
97 |
| ---------------- | -------------------- | ------ | ------ | ------ | ------ |
|
98 |
| GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
|
|
|
115 |
- Our results are evaluated with greedy decoding.
|
116 |
- Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
|
117 |
|
|
|
|
|
|
|
|
|
118 |
### RULER
|
119 |
|
120 |
| Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
|
|
|
126 |
| Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
|
127 |
| Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
|
128 |
|
|
|
|
|
|
|
|
|
129 |
### Short Context
|
130 |
|
131 |
| Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
|
|
|
137 |
| Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
|
138 |
| Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
|
139 |
|
|
|
|
|
140 |
## Citation
|
141 |
If you find our project useful, hope you can star our repo and cite our paper as follows:
|
142 |
```
|
|
|
148 |
year={2025},
|
149 |
url={https://openreview.net/forum?id=qTrEq31Shm}
|
150 |
}
|
151 |
+
```
|
|
|
|