⚠️ Experimental Release Notice:
This model is in an experimental phase on Hugging Face and is still undergoing training. Expect further enhancements and updates in the coming week.

NeuraLake iSA-02 Series: Advanced Small-Scale Reasoning Models

Overview

The NeuraLake iSA-02 Series comprises compact reasoning models optimized for efficient logical processing in resource-constrained environments. Designed for applications requiring nuanced decision-making and complex problem-solving, these models balance performance with computational efficiency.

Release Information

Model weights for each variant (1B, 2B, 3B, and 7B parameters) will be released post comprehensive training and optimization to ensure high performance and safety standards.

iSA-02-Nano-1B-Preview v1.1 (No Structured Tags Variant)

The iSA-02-Nano-1B-Preview is the latest addition to the iSA-02 series, enhanced with synthetic data to prioritize “thinking before speaking.” This focus enhances its reasoning capabilities, making it ideal for applications requiring thoughtful and logical text generation within a compact framework.

What is a Reasoning Model?

A reasoning model simulates human-like logical thinking, enabling the analysis of information, inference drawing, and decision-making based on data. Unlike traditional language models that generate text from patterns, reasoning models excel in understanding, planning, and executing multi-step processes.

Name and Inspiration

iSA: Stands for Intelligent, Small, Autonomous, reflecting the mission to create compact AI systems with adaptive and intelligent behavior.
Development: Initiated in January 2024, the series emerged from experiments combining diverse datasets, revealing initial reasoning capabilities in the base model. Unlike models derived from OpenAI, iSA-02 emphasizes unique reasoning enhancements through innovative synthetic data and contextual refinement.

Lineage

Based on meta-llama/Llama-3.2-1B-Instruct and refined with synthetic datasets from NeuraLake, the iSA-02-Nano-1B-Preview targets improvements in reasoning, long-context handling, and adaptive behaviors.

Key Features

Extended Context Window: Supports up to 256K tokens for complex reasoning and Retrieval-Augmented Generation (RAG).
Adaptive Reasoning: Adjusts reasoning depth based on context size—concise for <8K tokens and detailed for >16K tokens.
Efficiency Optimized: Balances advanced reasoning with low computational demands, suitable for resource-limited settings.

Model Specifications

Architecture

Type: Transformer-based
Layers: 16
Hidden Size: 2048
Attention Heads: 32
Feed-Forward Size: 8192
Vocabulary Size: 128,256

Training Parameters

Precision: Mixed Precision (fp16)
Context Window:
- Text Generation: 1,024–4,096 tokens
- Logical Reasoning: 16,000–64,000 tokens

Quantization Versions

Version	Format	Bits	Parameters	Download
F32	Custom Llama 3.2	FP32	1.24B	Download
F16	Custom Llama 3.2	FP16	1.24B	Download
Q4_0	Custom Llama 3.2	4-bit	1.24B	Download
Q4_K_M	Custom Llama 3.2	4-bit	1.24B	Download
Q5_K_M	Custom Llama 3.2	5-bit	1.24B	Download
Q8_0	Custom Llama 3.2	8-bit	1.24B	Download

Hardware Requirements

Version	Quantization	Size	Memory (RAM/vRAM)
F32	FP32	4.95 GB	9.9 GB
F16	FP16	2.48 GB	4.96 GB
Q4_0	4-bit	771 MB	1.56 GB
Q4_K_M	4-bit	808 MB	1.62 GB
Q5_K_M	5-bit	893 MB	1.84 GB
Q8_0	8-bit	1.32 GB	2.64 GB

Training and Fine-Tuning

Trained on synthetic datasets tailored to enhance logical reasoning, multi-step task execution, and contextual tool usage, the iSA-02 series ensures robust performance in complex scenarios and adaptive behaviors.

Use Cases

Applications

Logical Reasoning & Decision-Making: Generate analytical reports from system logs.
Dynamic Tool Integration: Ideal for long-context RAG tasks like querying large databases.
Structured Content Generation: Perfect for correcting OCR outputs and filling in missing data.

Limitations

Unsuitable for:
- High-throughput text generation.
- Latency-sensitive applications.
Challenges:
- Potential biases from synthetic data.
- Redundant or verbose reasoning.

Improvements in Version 1.1

Enhanced Reasoning: Faster processing with reduced overthinking.
Better Tool Utilization: More effective use of external tools.
Improved Context Understanding: Aligns actions with user intentions.
Reduced Redundancy: More concise responses.
Less Task Aversion: Fewer refusals of routine tasks.
Optimized Context Management: Efficient handling of the 256K context window.

Best Practices

Configuration Recommendations

max_tokens:
- Simple Tasks: 1,024–4,096 tokens
- Complex Tasks: 8,000–16,000 tokens
temperature:
- Objective Responses: 0.1–0.3
- Creative Reasoning: 0.7–1.0
top_p:
- Focused Outputs: 0.85
- Precision Tasks: 0.1
stop_sequences:
- Use specific sequences like "Therefore, the answer is," to minimize redundancy.

Prompt Engineering

Simple Tasks:
- Example: "You are a helpful assistant."
Complex Tasks:
- Example: "Transform OCR outputs into valid JSON, return only the JSON data as output."
- Structured Reasoning: "Not apply in "No Structured Tags", as it is not necessary or supported."

Supervision and Monitoring

Clear Prompts: Ensure instructions are specific and unambiguous to reduce errors and redundancies.

Known Issues (Addressed in V1.1)

Task Management: Improved handling of complex tasks and function calls.
Unusual Behavior: Reduced instances of unsolicited online searches or autonomous interactions.
Conversational Redirection: Enhanced stability in maintaining topic focus.
Function Call Execution: Ensured simulated function calls are actionable.

Citation

@misc{isa02,
  author       = {NeuraLake},
  title        = {iSA-02: The First Small Reasoning Model with Context-Dynamic Behavior},
  year         = {2024},
  license      = {Apache 2.0},
  url          = {https://huggingface.co/NeuraLake/iSA-02},
}

Note: This model card is under development and will be updated with additional details, evaluation metrics, and the final model name.