File size: 5,509 Bytes
7b59ab3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
# LIMO: Less Is More for Reasoning π
## π Table of Contents
- [Overview](#overview)
- [Key Results](#key-results)
- [Model Zoo](#model-zoo)
- [Datasets](#datasets)
- [Quick Start](#quick-start)
- [Training](#training)
- [Evaluation](#evaluation)
- [Citation](#citation)
## Overview
LIMO challenges the conventional wisdom in mathematical reasoning by demonstrating that models can achieve superior performance with significantly less but higher quality training data. Our approach:
- π― Achieves SOTA with only 817 carefully curated training samples
- π Shows strong generalization across diverse problem types
- π¬ Provides comprehensive evaluation on 10 benchmarks
- π Releases high-quality datasets and evaluation tools
## Key Results
| Model | AIME24 | MATH500 | Training Samples |
|-------|--------|---------|-----------------|
| LIMO (Ours) | **57.1%** | **94.8%** | 817 |
| Previous SOTA | 6.5% | 59.2% | 100k+ |
<details>
<summary>Click to see more detailed results</summary>
| Benchmark | LIMO | Previous SOTA | Improvement |
|-----------|------|--------------------------|-------------|
| AIME24 | **57.1%** | 6.5% | +50.6% |
| MATH500 | **94.8%** | 59.2% | +35.6% |
| AMC23 | **92.0%** | 40.6% | +51.4% |
| OlympiadBench | **66.8%** | 36.7% | +30.1% |
| CHMath | **75.4%** | 11.2% | +64.2% |
| Gaokao | **81.0%** | 49.4% | +31.6% |
| Kaoyan | **73.4%** | 32.7% | +40.7% |
| GradeSchool | **76.2%** | 36.2% | +40.0% |
| Minerva | 44.9% | **47.1%** | -2.2% |
| GPQA | 66.7% | **73.3%** | -6.6% |
</details>
## Model Zoo
Our LIMO model is available on Hugging Face π€:
| Model | Backbone | Size | Link |
|-------|------|------|------|
| LIMO | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32B | [π€](https://huggingface.co/GAIR/LIMO) |
## Datasets
We release our datasets through Hugging Face π€:
| Dataset | Description | Size | Link |
|---------|-------------|------|------|
| LIMO | Training set used to train LIMO model | 817 | [π€](https://huggingface.co/datasets/GAIR/LIMO) |
Note: We are gradually releasing additional datasets mentioned in our paper, including those used for comparative experiments, to facilitate reproducibility and further analysis by the research community. Stay tuned!
## Quick Start
Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc.
<details>
<summary>Start with HF Transformers</summary>
```bash
# Install required packages
pip install transformers
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Initialize model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"GAIR/LIMO",
torch_dtype="auto",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO", trust_remote_code=True)
# Prepare input messages (We use the following template and system prompt during training and inference)
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "What is the result of 1+1?"}
]
# Format input using chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize input
inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=32768,
temperature=0.7,
top_p=0.95,
do_sample=True
)
# Decode and print response
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
```
</details>
<details>
<summary>Start with VLLM</summary>
```bash
# Install required packages
pip install vllm
```
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
# Initialize the model
llm = LLM(
model="GAIR/LIMO",
tensor_parallel_size=4, # adjust based on available GPUs
trust_remote_code=True,
swap_space=60,
gpu_memory_utilization=0.96,
)
# Prepare input messages (We use the following template and system prompt during training and inference)
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "What is the result of 1+1?"}
]
# Setup tokenizer
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO", trust_remote_code=True)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Configure generation parameters
sampling_params = SamplingParams(
temperature=0.7,
max_tokens=32768,
top_p=0.95,
)
# Generate response
output = llm.generate(text, sampling_params)
print(output[0].outputs[0].text)
```
</details>
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Citation
```bibtex
@misc{ye2025limoreasoning,
title={LIMO: Less is More for Reasoning},
author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
year={2025},
eprint={2502.03387},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.03387},
}
``` |