AutoThink: Adaptive Reasoning for Large Language Models
TL;DR
We introduce AutoThink, a novel approach that significantly improves reasoning capabilities in large language models through adaptive resource allocation and steering vector guidance. By classifying query complexity and dynamically allocating computational resources, AutoThink achieves up to 43% improvement on reasoning benchmarks while using fewer tokens than baseline approaches.
Key contributions:
- Novel query complexity classification for adaptive token budgeting
- Steering vectors derived from Pivotal Token Search for guided reasoning
- Significant performance improvements on GPQA-Diamond and MMLU-Pro benchmarks
- Full open-source implementation compatible with any local reasoning model
Introduction
Large Language Models have shown remarkable capabilities in reasoning tasks, but current approaches often use fixed computational budgets regardless of query complexity. A simple arithmetic problem receives the same "thinking time" as a complex multi-step proof, leading to inefficient resource usage and suboptimal performance.
Consider these two queries:
- "What is 15 + 27?"
- "Prove that the square root of 2 is irrational using contradiction."
Intuitively, the second query requires significantly more reasoning steps and computational resources. Yet most current systems allocate similar computational budgets to both.
AutoThink addresses this challenge through three key innovations:
- Query Complexity Classification: Automatically classify queries as HIGH or LOW complexity
- Dynamic Token Budgets: Allocate thinking tokens based on classified complexity
- Steering Vector Guidance: Guide reasoning patterns using activation-level steering
Methodology
Query Complexity Classification
AutoThink begins by classifying each query using our adaptive classifier framework. This classifier determines whether a query requires HIGH or LOW complexity reasoning:
- HIGH complexity: Multi-step reasoning, complex mathematics, logical proofs
- LOW complexity: Simple factual questions, basic arithmetic, straightforward tasks
The classifier uses the adaptive-classifier/llm-router
model and includes a fallback heuristic based on linguistic indicators when the model isn't available.
Dynamic Token Budget Allocation
Based on the complexity classification, AutoThink dynamically allocates thinking tokens:
- HIGH complexity queries: 70-90% of available tokens for thinking
- LOW complexity queries: 20-40% of available tokens for thinking
This adaptive allocation ensures that complex problems receive adequate computational resources while simple queries complete efficiently.
Steering Vector Guidance
Perhaps the most novel aspect of AutoThink is its use of steering vectors to guide reasoning patterns. These vectors are derived from Pivotal Token Search (PTS), a technique introduced in Microsoft's Phi-4 technical report that we implemented and enhanced.
Steering vectors represent different reasoning patterns:
depth_and_thoroughness
: Encourages detailed, step-by-step reasoningnumerical_accuracy
: Promotes precise calculations and verificationself_correction
: Facilitates error detection and correctionexploration
: Supports considering multiple approachesorganization
: Improves logical structure in responses
During generation, these vectors modify the model's internal activations at a target layer, effectively "steering" the model toward desired reasoning behaviors.
Implementation
Installation and Setup
# Install optillm
pip install optillm
# Or install from source
git clone https://github.com/codelion/optillm.git
cd optillm
pip install -r requirements.txt
Basic Usage
from optillm.autothink import autothink_decode
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load your model
model_name = "deepseek-ai/deepseek-r1-llama-8b"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Configure AutoThink
config = {
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
"target_layer": 19, # Adjust based on your model
"pattern_strengths": {
"depth_and_thoroughness": 2.5,
"numerical_accuracy": 2.0,
"self_correction": 3.0,
"exploration": 2.0,
"organization": 1.5
}
}
# Create messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing principles and their applications."}
]
# Process with AutoThink
response = autothink_decode(model, tokenizer, messages, config)
print(response)
Advanced Configuration
You can customize AutoThink's behavior through various configuration options:
# Advanced configuration
advanced_config = {
# Classification settings
"classifier_model": "adaptive-classifier/llm-router",
"complexity_threshold": 0.6,
# Token budget settings
"high_complexity_min_tokens": 1024,
"high_complexity_max_tokens": 4096,
"low_complexity_min_tokens": 256,
"low_complexity_max_tokens": 1024,
# Steering vector settings
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
"target_layer": 19,
# Thinking control
"start_think_token": "<think>",
"end_think_token": "</think>",
"max_thoughts": 64
}
Working with Different Models
AutoThink is designed to work with various model architectures. Here's how to adapt it for different models:
# For DeepSeek models
deepseek_config = {
"target_layer": 19, # Middle layer works well
"steering_dataset": "codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors"
}
# For Qwen models
qwen_config = {
"target_layer": 19,
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors"
}
Technical Deep Dive
Pivotal Token Search Implementation
Our steering vectors are derived from Pivotal Token Search, which identifies tokens that have the most significant impact on model behavior. Here's how it works:
- Token Analysis: For each token in a context, measure its influence on model outputs
- Pivotal Identification: Identify tokens that, when modified, create the largest changes in model behavior
- Vector Extraction: Extract activation vectors associated with these pivotal tokens
- Pattern Classification: Group vectors by reasoning patterns they encourage
You can explore our PTS implementation:
- Code: https://github.com/codelion/pts
- Technical Blog: https://huggingface.co/blog/codelion/pts
Adaptive Classification System
The complexity classifier is built on our adaptive classification framework, which offers several advantages:
- Dynamic Classes: Add new complexity categories without retraining
- Continuous Learning: Learn from new examples incrementally
- Flexible Architecture: Adapt to different domains and use cases
Learn more about adaptive classification: https://github.com/codelion/adaptive-classifier
Steering Vector Application
During generation, steering vectors are applied through forward hooks on the target layer:
def steering_hook(module, input_tensors, output):
"""Apply steering vector to model activations"""
if steering_vector is not None:
# Get current reasoning pattern and strength
pattern = current_vector.get("reasoning_pattern", "unknown")
strength = get_steering_strength(pattern)
# Apply vector to last token's representation
vector = torch.tensor(steering_vector, device=output.device)
output[-1, -1, :] += strength * vector
return output
Evaluation and Results
Benchmark Performance
We evaluated AutoThink on several reasoning benchmarks using DeepSeek-R1-Distill-Qwen-1.5B:
Benchmark | Baseline | AutoThink | Improvement |
---|---|---|---|
GPQA-Diamond | 21.72% | 31.06% | +9.34 points |
MMLU-Pro | 25.58% | 26.38% | +0.8 points |
Efficiency Analysis
AutoThink not only improves performance but also enhances efficiency:
- Token Usage: 15-25% reduction in average tokens per query
- Latency: Minimal overhead from classification (~10ms)
- Memory: Negligible additional memory usage for steering vectors
Integration with optillm
AutoThink is part of the broader optillm project, which provides multiple reasoning enhancement techniques:
# AutoThink within optillm ecosystem
from optillm import create_inference_client
# Create client with AutoThink
client = create_inference_client()
response = client.chat.completions.create(
model="autothink-deepseek-r1-llama-8b",
messages=messages
)
Community and Contributions
AutoThink is fully open source and we welcome community contributions:
- Steering Vector Datasets: Help create domain-specific steering vectors
- Model Support: Test and optimize for new model architectures
- Evaluation: Run AutoThink on new benchmarks and share results
- Applications: Build interesting applications using AutoThink
Getting Involved
- ๐ Star the repository: https://github.com/codelion/optillm
- ๐ Report issues or suggest features
- ๐ฌ Share your evaluation results
- ๐ Contribute to documentation
- ๐ป Submit pull requests
Conclusion
AutoThink represents a significant step forward in adaptive reasoning for large language models. By intelligently allocating computational resources and guiding reasoning patterns through steering vectors, we can achieve substantial performance improvements while maintaining efficiency.
The technique's model-agnostic design and open-source implementation make it accessible to the broader research community. We believe that adaptive reasoning approaches like AutoThink will be crucial for the next generation of AI systems that can reason more effectively while using resources more efficiently.
We look forward to seeing how the community builds upon this work and adapts it for new applications and domains.
References and Resources
- Research Paper: AutoThink: efficient inference for reasoning LLMs
- AutoThink Implementation: https://github.com/codelion/optillm/tree/main/optillm/autothink
- Pivotal Token Search: https://github.com/codelion/pts
- PTS Technical Blog: https://huggingface.co/blog/codelion/pts
- Adaptive Classifier: https://github.com/codelion/adaptive-classifier
- optillm Project: https://github.com/codelion/optillm
Have you tried AutoThink with your models? We'd love to hear about your experiences and results! Share them in the comments below or reach out to us on GitHub.