DAMO-NLP-SG
/

Qwen2.5-7B-LongPO-128K

@@ -1,26 +1,24 @@
 ---
-library_name: transformers
-license: apache-2.0
-datasets:
-- DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
 ---
 # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
 This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
 <h5 align="left">
 [![arXiv](https://img.shields.io/badge/Arxiv-2501.13106-AD1C18.svg?logo=arXiv)](http://arxiv.org/abs/2502.13922)
 [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
 </h5>
 ## Highlights of LongPO
@@ -28,10 +26,8 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
 - Extending context length while keeping aligned in one stage.
 - No degradation on short-context capabilities.
 <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
 ## Models and Training Data
 | Models                                                       | Base Model               | Training Data                                                | # Data Samples |
@@ -43,10 +39,6 @@ This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO
 \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
 ## Training Process:
 1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
@@ -99,11 +91,8 @@ train/train_longpo.py \
 ## Evaluation
 ### InfiniteBench
 | Model            | Train/Claimed Length | En.Sum | En.QA  | En.MC  | AVG.   |
 | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
 | GPT-4-128K       | 128K                 | 14.73  | 22.44  | 67.25  | 34.81  |
@@ -126,10 +115,6 @@ train/train_longpo.py \
 - Our results are evaluated with greedy decoding.
 - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
 ### RULER
 | Model                    | NIAH  | VT    | AGG   | QA    | AVG (13 tasks) |
@@ -141,10 +126,6 @@ train/train_longpo.py \
 | Mistral-7B-LongPO-256K-EXP   | 96.80 | 97.00 | 69.14 | 64.87 | 87.65          |
 | Mistral-7B-LongPO-512K-EXP   | 97.28 | 97.48 | 69.22 | 64.92 | 88.00          |
 ### Short Context
 | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
@@ -156,8 +137,6 @@ train/train_longpo.py \
 | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
 | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
 ## Citation
 If you find our project useful, hope you can star our repo and cite our paper as follows:
 ```
@@ -169,6 +148,4 @@ If you find our project useful, hope you can star our repo and cite our paper as
     year={2025},
     url={https://openreview.net/forum?id=qTrEq31Shm}
 }
-```

 ---
 base_model:
 - Qwen/Qwen2.5-7B-Instruct
+datasets:
+- DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K-tokenized
+library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
 This repo provides the checkpoint of Qwen2.5-7B-LongPO-128K in our paper "LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization".
 <h5 align="left">
 [![arXiv](https://img.shields.io/badge/Arxiv-2501.13106-AD1C18.svg?logo=arXiv)](http://arxiv.org/abs/2502.13922)
 [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
 </h5>
+[Code]: https://github.com/DAMO-NLP-SG/LongPO
 ## Highlights of LongPO
 - Extending context length while keeping aligned in one stage.
 - No degradation on short-context capabilities.
 <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
 ## Models and Training Data
 | Models                                                       | Base Model               | Training Data                                                | # Data Samples |
 \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
 ## Training Process:
 1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
 ## Evaluation
 ### InfiniteBench
 | Model            | Train/Claimed Length | En.Sum | En.QA  | En.MC  | AVG.   |
 | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
 | GPT-4-128K       | 128K                 | 14.73  | 22.44  | 67.25  | 34.81  |
 - Our results are evaluated with greedy decoding.
 - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
 ### RULER
 | Model                    | NIAH  | VT    | AGG   | QA    | AVG (13 tasks) |
 | Mistral-7B-LongPO-256K-EXP   | 96.80 | 97.00 | 69.14 | 64.87 | 87.65          |
 | Mistral-7B-LongPO-512K-EXP   | 97.28 | 97.48 | 69.22 | 64.92 | 88.00          |
 ### Short Context
 | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
 | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
 | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
 ## Citation
 If you find our project useful, hope you can star our repo and cite our paper as follows:
 ```
     year={2025},
     url={https://openreview.net/forum?id=qTrEq31Shm}
 }
+```