AutoThink: Adaptive Reasoning for Large Language Models

Community Article Published May 27, 2025

TL;DR

We introduce AutoThink, a novel approach that significantly improves reasoning capabilities in large language models through adaptive resource allocation and steering vector guidance. By classifying query complexity and dynamically allocating computational resources, AutoThink achieves up to 43% improvement on reasoning benchmarks while using fewer tokens than baseline approaches.

Key contributions:

  • Novel query complexity classification for adaptive token budgeting
  • Steering vectors derived from Pivotal Token Search for guided reasoning
  • Significant performance improvements on GPQA-Diamond and MMLU-Pro benchmarks
  • Full open-source implementation compatible with any local reasoning model

Introduction

Large Language Models have shown remarkable capabilities in reasoning tasks, but current approaches often use fixed computational budgets regardless of query complexity. A simple arithmetic problem receives the same "thinking time" as a complex multi-step proof, leading to inefficient resource usage and suboptimal performance.

Consider these two queries:

  1. "What is 15 + 27?"
  2. "Prove that the square root of 2 is irrational using contradiction."

Intuitively, the second query requires significantly more reasoning steps and computational resources. Yet most current systems allocate similar computational budgets to both.

AutoThink addresses this challenge through three key innovations:

  1. Query Complexity Classification: Automatically classify queries as HIGH or LOW complexity
  2. Dynamic Token Budgets: Allocate thinking tokens based on classified complexity
  3. Steering Vector Guidance: Guide reasoning patterns using activation-level steering

Methodology

image/png

Query Complexity Classification

AutoThink begins by classifying each query using our adaptive classifier framework. This classifier determines whether a query requires HIGH or LOW complexity reasoning:

  • HIGH complexity: Multi-step reasoning, complex mathematics, logical proofs
  • LOW complexity: Simple factual questions, basic arithmetic, straightforward tasks

The classifier uses the adaptive-classifier/llm-router model and includes a fallback heuristic based on linguistic indicators when the model isn't available.

Dynamic Token Budget Allocation

Based on the complexity classification, AutoThink dynamically allocates thinking tokens:

  • HIGH complexity queries: 70-90% of available tokens for thinking
  • LOW complexity queries: 20-40% of available tokens for thinking

This adaptive allocation ensures that complex problems receive adequate computational resources while simple queries complete efficiently.

Steering Vector Guidance

Perhaps the most novel aspect of AutoThink is its use of steering vectors to guide reasoning patterns. These vectors are derived from Pivotal Token Search (PTS), a technique introduced in Microsoft's Phi-4 technical report that we implemented and enhanced.

Steering vectors represent different reasoning patterns:

  • depth_and_thoroughness: Encourages detailed, step-by-step reasoning
  • numerical_accuracy: Promotes precise calculations and verification
  • self_correction: Facilitates error detection and correction
  • exploration: Supports considering multiple approaches
  • organization: Improves logical structure in responses

During generation, these vectors modify the model's internal activations at a target layer, effectively "steering" the model toward desired reasoning behaviors.

Implementation

Installation and Setup

# Install optillm
pip install optillm

# Or install from source
git clone https://github.com/codelion/optillm.git
cd optillm
pip install -r requirements.txt

Basic Usage

from optillm.autothink import autothink_decode
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load your model
model_name = "deepseek-ai/deepseek-r1-llama-8b"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Configure AutoThink
config = {
    "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
    "target_layer": 19,  # Adjust based on your model
    "pattern_strengths": {
        "depth_and_thoroughness": 2.5,
        "numerical_accuracy": 2.0,
        "self_correction": 3.0,
        "exploration": 2.0,
        "organization": 1.5
    }
}

# Create messages
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing principles and their applications."}
]

# Process with AutoThink
response = autothink_decode(model, tokenizer, messages, config)
print(response)

Advanced Configuration

You can customize AutoThink's behavior through various configuration options:

# Advanced configuration
advanced_config = {
    # Classification settings
    "classifier_model": "adaptive-classifier/llm-router",
    "complexity_threshold": 0.6,
    
    # Token budget settings
    "high_complexity_min_tokens": 1024,
    "high_complexity_max_tokens": 4096,
    "low_complexity_min_tokens": 256,
    "low_complexity_max_tokens": 1024,
    
    # Steering vector settings
    "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
    "target_layer": 19,
    
    # Thinking control
    "start_think_token": "<think>",
    "end_think_token": "</think>",
    "max_thoughts": 64
}

Working with Different Models

AutoThink is designed to work with various model architectures. Here's how to adapt it for different models:

# For DeepSeek models
deepseek_config = {
    "target_layer": 19,  # Middle layer works well
    "steering_dataset": "codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors"
}

# For Qwen models
qwen_config = {
    "target_layer": 19,  
    "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors"
}

Technical Deep Dive

Pivotal Token Search Implementation

Our steering vectors are derived from Pivotal Token Search, which identifies tokens that have the most significant impact on model behavior. Here's how it works:

  1. Token Analysis: For each token in a context, measure its influence on model outputs
  2. Pivotal Identification: Identify tokens that, when modified, create the largest changes in model behavior
  3. Vector Extraction: Extract activation vectors associated with these pivotal tokens
  4. Pattern Classification: Group vectors by reasoning patterns they encourage

You can explore our PTS implementation:

Adaptive Classification System

The complexity classifier is built on our adaptive classification framework, which offers several advantages:

  • Dynamic Classes: Add new complexity categories without retraining
  • Continuous Learning: Learn from new examples incrementally
  • Flexible Architecture: Adapt to different domains and use cases

Learn more about adaptive classification: https://github.com/codelion/adaptive-classifier

Steering Vector Application

During generation, steering vectors are applied through forward hooks on the target layer:

def steering_hook(module, input_tensors, output):
    """Apply steering vector to model activations"""
    if steering_vector is not None:
        # Get current reasoning pattern and strength
        pattern = current_vector.get("reasoning_pattern", "unknown")
        strength = get_steering_strength(pattern)
        
        # Apply vector to last token's representation
        vector = torch.tensor(steering_vector, device=output.device)
        output[-1, -1, :] += strength * vector
    
    return output

Evaluation and Results

Benchmark Performance

We evaluated AutoThink on several reasoning benchmarks using DeepSeek-R1-Distill-Qwen-1.5B:

Benchmark Baseline AutoThink Improvement
GPQA-Diamond 21.72% 31.06% +9.34 points
MMLU-Pro 25.58% 26.38% +0.8 points

Efficiency Analysis

AutoThink not only improves performance but also enhances efficiency:

  • Token Usage: 15-25% reduction in average tokens per query
  • Latency: Minimal overhead from classification (~10ms)
  • Memory: Negligible additional memory usage for steering vectors

Integration with optillm

AutoThink is part of the broader optillm project, which provides multiple reasoning enhancement techniques:

# AutoThink within optillm ecosystem
from optillm import create_inference_client

# Create client with AutoThink
client = create_inference_client()
response = client.chat.completions.create(
    model="autothink-deepseek-r1-llama-8b",
    messages=messages
)

Community and Contributions

AutoThink is fully open source and we welcome community contributions:

  • Steering Vector Datasets: Help create domain-specific steering vectors
  • Model Support: Test and optimize for new model architectures
  • Evaluation: Run AutoThink on new benchmarks and share results
  • Applications: Build interesting applications using AutoThink

Getting Involved

  • ๐ŸŒŸ Star the repository: https://github.com/codelion/optillm
  • ๐Ÿ› Report issues or suggest features
  • ๐Ÿ”ฌ Share your evaluation results
  • ๐Ÿ“– Contribute to documentation
  • ๐Ÿ’ป Submit pull requests

Conclusion

AutoThink represents a significant step forward in adaptive reasoning for large language models. By intelligently allocating computational resources and guiding reasoning patterns through steering vectors, we can achieve substantial performance improvements while maintaining efficiency.

The technique's model-agnostic design and open-source implementation make it accessible to the broader research community. We believe that adaptive reasoning approaches like AutoThink will be crucial for the next generation of AI systems that can reason more effectively while using resources more efficiently.

We look forward to seeing how the community builds upon this work and adapts it for new applications and domains.

References and Resources


Have you tried AutoThink with your models? We'd love to hear about your experiences and results! Share them in the comments below or reach out to us on GitHub.

Community

Sign up or log in to comment