GAIR
/

LIMO

Safetensors

qwen2

Model card Files Files and versions Community

YangXiao-nlp commited on 9 days ago

Commit

7b59ab3

verified ·

1 Parent(s): 910230f

Create README.md

Browse files

Files changed (1) hide show

README.md +196 -0

README.md ADDED Viewed

	@@ -0,0 +1,196 @@

+# LIMO: Less Is More for Reasoning 🚀
+## 📌 Table of Contents
+- [Overview](#overview)
+- [Key Results](#key-results)
+- [Model Zoo](#model-zoo)
+- [Datasets](#datasets)
+- [Quick Start](#quick-start)
+- [Training](#training)
+- [Evaluation](#evaluation)
+- [Citation](#citation)
+## Overview
+LIMO challenges the conventional wisdom in mathematical reasoning by demonstrating that models can achieve superior performance with significantly less but higher quality training data. Our approach:
+- 🎯 Achieves SOTA with only 817 carefully curated training samples
+- 🌟 Shows strong generalization across diverse problem types
+- 🔬 Provides comprehensive evaluation on 10 benchmarks
+- 📚 Releases high-quality datasets and evaluation tools
+## Key Results
+| Model | AIME24 | MATH500 | Training Samples |
+|-------|--------|---------|-----------------|
+| LIMO (Ours) | **57.1%** | **94.8%** | 817 |
+| Previous SOTA | 6.5% | 59.2% | 100k+ |
+<details>
+<summary>Click to see more detailed results</summary>
+| Benchmark | LIMO | Previous SOTA | Improvement |
+|-----------|------|--------------------------|-------------|
+| AIME24 | **57.1%** | 6.5% | +50.6% |
+| MATH500 | **94.8%** | 59.2% | +35.6% |
+| AMC23 | **92.0%** | 40.6% | +51.4% |
+| OlympiadBench | **66.8%** | 36.7% | +30.1% |
+| CHMath | **75.4%** | 11.2% | +64.2% |
+| Gaokao | **81.0%** | 49.4% | +31.6% |
+| Kaoyan | **73.4%** | 32.7% | +40.7% |
+| GradeSchool | **76.2%** | 36.2% | +40.0% |
+| Minerva | 44.9% | **47.1%** | -2.2% |
+| GPQA | 66.7% | **73.3%** | -6.6% |
+</details>
+## Model Zoo
+Our LIMO model is available on Hugging Face 🤗:
+| Model | Backbone | Size | Link |
+|-------|------|------|------|
+| LIMO | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)  | 32B | [🤗](https://huggingface.co/GAIR/LIMO) |
+## Datasets
+We release our datasets through Hugging Face 🤗:
+| Dataset | Description | Size | Link |
+|---------|-------------|------|------|
+| LIMO | Training set used to train LIMO model | 817 | [🤗](https://huggingface.co/datasets/GAIR/LIMO) |
+Note: We are gradually releasing additional datasets mentioned in our paper, including those used for comparative experiments, to facilitate reproducibility and further analysis by the research community. Stay tuned!
+## Quick Start
+Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc.
+<details>
+<summary>Start with HF Transformers</summary>
+```bash
+# Install required packages
+pip install transformers
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Initialize model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "GAIR/LIMO",
+    torch_dtype="auto",
+    trust_remote_code=True,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO", trust_remote_code=True)
+# Prepare input messages (We use the following template and system prompt during training and inference)
+messages = [
+    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
+    {"role": "user", "content": "What is the result of 1+1?"}
+]
+# Format input using chat template
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Tokenize input
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=32768,
+    temperature=0.7,
+    top_p=0.95,
+    do_sample=True
+)
+# Decode and print response
+response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+</details>
+<details>
+<summary>Start with VLLM</summary>
+```bash
+# Install required packages
+pip install vllm
+```
+```python
+from vllm import LLM, SamplingParams
+from transformers import AutoTokenizer
+# Initialize the model
+llm = LLM(
+    model="GAIR/LIMO",
+    tensor_parallel_size=4,  # adjust based on available GPUs
+    trust_remote_code=True,
+    swap_space=60,
+    gpu_memory_utilization=0.96,
+)
+# Prepare input messages (We use the following template and system prompt during training and inference)
+messages = [
+    {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
+    {"role": "user", "content": "What is the result of 1+1?"}
+]
+# Setup tokenizer
+tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO", trust_remote_code=True)
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Configure generation parameters
+sampling_params = SamplingParams(
+    temperature=0.7,
+    max_tokens=32768,
+    top_p=0.95,
+)
+# Generate response
+output = llm.generate(text, sampling_params)
+print(output[0].outputs[0].text)
+```
+</details>
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Citation
+```bibtex
+@misc{ye2025limoreasoning,
+      title={LIMO: Less is More for Reasoning},
+      author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
+      year={2025},
+      eprint={2502.03387},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2502.03387},
+}
+```