Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference
Inference Endpoints

Add text-generation tag and link to code repository

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +7 -30
README.md CHANGED
@@ -1,26 +1,24 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- datasets:
5
- - DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
6
  base_model:
7
  - Qwen/Qwen2.5-7B-Instruct
 
 
 
 
 
8
  ---
9
 
10
  # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
11
 
12
  This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
13
 
14
-
15
-
16
-
17
  <h5 align="left">
18
 
19
  [![arXiv](https://img.shields.io/badge/Arxiv-2501.13106-AD1C18.svg?logo=arXiv)](http://arxiv.org/abs/2502.13922)
20
  [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
21
  </h5>
22
 
23
-
24
 
25
  ## Highlights of LongPO
26
 
@@ -28,10 +26,8 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
28
  - Extending context length while keeping aligned in one stage.
29
  - No degradation on short-context capabilities.
30
 
31
-
32
  <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
33
 
34
-
35
  ## Models and Training Data
36
 
37
  | Models | Base Model | Training Data | # Data Samples |
@@ -43,10 +39,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
43
 
44
  \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
45
 
46
-
47
-
48
-
49
-
50
  ## Training Process:
51
 
52
  1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
@@ -99,11 +91,8 @@ train/train_longpo.py \
99
 
100
  ## Evaluation
101
 
102
-
103
-
104
  ### InfiniteBench
105
 
106
-
107
  | Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
108
  | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
109
  | GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
@@ -126,10 +115,6 @@ train/train_longpo.py \
126
  - Our results are evaluated with greedy decoding.
127
  - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
128
 
129
-
130
-
131
-
132
-
133
  ### RULER
134
 
135
  | Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
@@ -141,10 +126,6 @@ train/train_longpo.py \
141
  | Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
142
  | Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
143
 
144
-
145
-
146
-
147
-
148
  ### Short Context
149
 
150
  | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
@@ -156,8 +137,6 @@ train/train_longpo.py \
156
  | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
157
  | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
158
 
159
-
160
-
161
  ## Citation
162
  If you find our project useful, hope you can star our repo and cite our paper as follows:
163
  ```
@@ -169,6 +148,4 @@ If you find our project useful, hope you can star our repo and cite our paper as
169
  year={2025},
170
  url={https://openreview.net/forum?id=qTrEq31Shm}
171
  }
172
- ```
173
-
174
-
 
1
  ---
 
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B-Instruct
4
+ datasets:
5
+ - DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
12
 
13
  This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
14
 
 
 
 
15
  <h5 align="left">
16
 
17
  [![arXiv](https://img.shields.io/badge/Arxiv-2501.13106-AD1C18.svg?logo=arXiv)](http://arxiv.org/abs/2502.13922)
18
  [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
19
  </h5>
20
 
21
+ [Code]: https://github.com/DAMO-NLP-SG/LongPO
22
 
23
  ## Highlights of LongPO
24
 
 
26
  - Extending context length while keeping aligned in one stage.
27
  - No degradation on short-context capabilities.
28
 
 
29
  <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
30
 
 
31
  ## Models and Training Data
32
 
33
  | Models | Base Model | Training Data | # Data Samples |
 
39
 
40
  \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
41
 
 
 
 
 
42
  ## Training Process:
43
 
44
  1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
 
91
 
92
  ## Evaluation
93
 
 
 
94
  ### InfiniteBench
95
 
 
96
  | Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
97
  | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
98
  | GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
 
115
  - Our results are evaluated with greedy decoding.
116
  - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
117
 
 
 
 
 
118
  ### RULER
119
 
120
  | Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
 
126
  | Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
127
  | Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
128
 
 
 
 
 
129
  ### Short Context
130
 
131
  | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
 
137
  | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
138
  | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
139
 
 
 
140
  ## Citation
141
  If you find our project useful, hope you can star our repo and cite our paper as follows:
142
  ```
 
148
  year={2025},
149
  url={https://openreview.net/forum?id=qTrEq31Shm}
150
  }
151
+ ```