⚠️ Experimental Release Notice:
This model is in an experimental phase on Hugging Face and is still undergoing training. Expect further enhancements and updates in the coming week.

NeuraLake iSA-02 Series: Advanced Small-Scale Reasoning Models

Overview

The NeuraLake iSA-02 Series comprises compact reasoning models optimized for efficient logical processing in resource-constrained environments. Designed for applications requiring nuanced decision-making and complex problem-solving, these models balance performance with computational efficiency.

Release Information

Model weights for each variant (1B, 2B, 3B, and 7B parameters) will be released post comprehensive training and optimization to ensure high performance and safety standards.

iSA-02-Nano-1B-Preview v1.1 (No Structured Tags Variant)

The iSA-02-Nano-1B-Preview is the latest addition to the iSA-02 series, enhanced with synthetic data to prioritize β€œthinking before speaking.” This focus enhances its reasoning capabilities, making it ideal for applications requiring thoughtful and logical text generation within a compact framework.

What is a Reasoning Model?

A reasoning model simulates human-like logical thinking, enabling the analysis of information, inference drawing, and decision-making based on data. Unlike traditional language models that generate text from patterns, reasoning models excel in understanding, planning, and executing multi-step processes.

image/png

Name and Inspiration

  • iSA: Stands for Intelligent, Small, Autonomous, reflecting the mission to create compact AI systems with adaptive and intelligent behavior.
  • Development: Initiated in January 2024, the series emerged from experiments combining diverse datasets, revealing initial reasoning capabilities in the base model. Unlike models derived from OpenAI, iSA-02 emphasizes unique reasoning enhancements through innovative synthetic data and contextual refinement.

Lineage

Based on meta-llama/Llama-3.2-1B-Instruct and refined with synthetic datasets from NeuraLake, the iSA-02-Nano-1B-Preview targets improvements in reasoning, long-context handling, and adaptive behaviors.

Key Features

  • Extended Context Window: Supports up to 256K tokens for complex reasoning and Retrieval-Augmented Generation (RAG).
  • Adaptive Reasoning: Adjusts reasoning depth based on context sizeβ€”concise for <8K tokens and detailed for >16K tokens.
  • Efficiency Optimized: Balances advanced reasoning with low computational demands, suitable for resource-limited settings.

Model Specifications

Architecture

  • Type: Transformer-based
  • Layers: 16
  • Hidden Size: 2048
  • Attention Heads: 32
  • Feed-Forward Size: 8192
  • Vocabulary Size: 128,256

Training Parameters

  • Precision: Mixed Precision (fp16)
  • Context Window:
    • Text Generation: 1,024–4,096 tokens
    • Logical Reasoning: 16,000–64,000 tokens

Quantization Versions

Version Format Bits Parameters Download
F32 Custom Llama 3.2 FP32 1.24B Download
F16 Custom Llama 3.2 FP16 1.24B Download
Q4_0 Custom Llama 3.2 4-bit 1.24B Download
Q4_K_M Custom Llama 3.2 4-bit 1.24B Download
Q5_K_M Custom Llama 3.2 5-bit 1.24B Download
Q8_0 Custom Llama 3.2 8-bit 1.24B Download

Hardware Requirements

Version Quantization Size Memory (RAM/vRAM)
F32 FP32 4.95 GB 9.9 GB
F16 FP16 2.48 GB 4.96 GB
Q4_0 4-bit 771 MB 1.56 GB
Q4_K_M 4-bit 808 MB 1.62 GB
Q5_K_M 5-bit 893 MB 1.84 GB
Q8_0 8-bit 1.32 GB 2.64 GB

Training and Fine-Tuning

Trained on synthetic datasets tailored to enhance logical reasoning, multi-step task execution, and contextual tool usage, the iSA-02 series ensures robust performance in complex scenarios and adaptive behaviors.

Use Cases

Applications

  • Logical Reasoning & Decision-Making: Generate analytical reports from system logs.
  • Dynamic Tool Integration: Ideal for long-context RAG tasks like querying large databases.
  • Structured Content Generation: Perfect for correcting OCR outputs and filling in missing data.

Limitations

  • Unsuitable for:
    • High-throughput text generation.
    • Latency-sensitive applications.
  • Challenges:
    • Potential biases from synthetic data.
    • Redundant or verbose reasoning.

Improvements in Version 1.1

  • Enhanced Reasoning: Faster processing with reduced overthinking.
  • Better Tool Utilization: More effective use of external tools.
  • Improved Context Understanding: Aligns actions with user intentions.
  • Reduced Redundancy: More concise responses.
  • Less Task Aversion: Fewer refusals of routine tasks.
  • Optimized Context Management: Efficient handling of the 256K context window.

Best Practices

Configuration Recommendations

  • max_tokens:
    • Simple Tasks: 1,024–4,096 tokens
    • Complex Tasks: 8,000–16,000 tokens
  • temperature:
    • Objective Responses: 0.1–0.3
    • Creative Reasoning: 0.7–1.0
  • top_p:
    • Focused Outputs: 0.85
    • Precision Tasks: 0.1
  • stop_sequences:
    • Use specific sequences like "Therefore, the answer is," to minimize redundancy.

Prompt Engineering

  • Simple Tasks:
    • Example: "You are a helpful assistant."
  • Complex Tasks:
    • Example: "Transform OCR outputs into valid JSON, return only the JSON data as output."
    • Structured Reasoning: "Not apply in "No Structured Tags", as it is not necessary or supported."

Supervision and Monitoring

  • Clear Prompts: Ensure instructions are specific and unambiguous to reduce errors and redundancies.

Known Issues (Addressed in V1.1)

  • Task Management: Improved handling of complex tasks and function calls.
  • Unusual Behavior: Reduced instances of unsolicited online searches or autonomous interactions.
  • Conversational Redirection: Enhanced stability in maintaining topic focus.
  • Function Call Execution: Ensured simulated function calls are actionable.

Citation

@misc{isa02,
  author       = {NeuraLake},
  title        = {iSA-02: The First Small Reasoning Model with Context-Dynamic Behavior},
  year         = {2024},
  license      = {Apache 2.0},
  url          = {https://huggingface.co/NeuraLake/iSA-02},
}

Note: This model card is under development and will be updated with additional details, evaluation metrics, and the final model name.

Downloads last month
260
GGUF
Model size
1.24B params
Architecture
llama

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .