NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B (v1.2)

Overview

The iSA-02-Nano-Llama-3.2-1B is a Base Model designed for text generation, optimized for reasoning tasks. Based on meta-llama/Llama-3.2-1B, this model has been deeply customized by NeuraLake and stands out for its ability to work with an extended context window of 1,048,576 tokens. It was created to allow businesses and developers to fine-tune it for specific tasks that require processing large volumes of information. Designed by NeuraLake using synthetic datasets, the model embodies the philosophy of "think before you speak," enhancing reasoning capabilities for small-scale models.

✨ Extended Context Window ✨: The iSA-02-Nano-Llama-3.2-1B features an unprecedented context window of 1,048,576 tokens, enabling the analysis and generation of extremely long and complex texts. This sets a new standard for small yet powerful reasoning models. 🚀

Key Features

Extended Context 📚: Supports up to 1,048,576 tokens, enabling the analysis and generation of long, complex texts.
Advanced Reasoning 🧠: Integrates sophisticated reasoning chains for handling complex tasks.
Customization 🔧: Ideal for businesses seeking to tailor the model to specific tasks, with a robust framework for further fine-tuning and training.
Compact Yet Powerful 💡:
- What does this mean?
  Think of the model as a digital brain that learns from many examples. "Parameters" are like the connections in this brain, and 1 billion parameters indicate a compact model that is still powerful enough to process and generate information intelligently. Even though it's considered small compared to giant models, it's highly optimized reasoning tasks.

Architecture and Training

Base Model: Built on the meta-llama/Llama-3.2-1B architecture from Meta, optimized using advanced agent mixing techniques in AAA (AI aligning AI) mode.
Training and Data Generation Process 🔄:
The training process leveraged advanced synthetic data generation techniques to create a diverse and extensive dataset comprising billions of tokens. This was achieved through a multi-stage process involving data generation, reasoning chain creation, and translation to ensure high-quality training data.

This approach resulted in a dataset with billions of tokens, enabling robust and diverse training for the entire iSA-02 series by NeuraLake, thereby enhancing the model's ability to perform complex reasoning.
Context Window 🏞️: The extension to 1,048,576 tokens allows the model to handle large amounts of text or information, benefiting applications that require deep analysis.

Intended Use

Corporate Customization 🏢: Fine-tune the model to address specific challenges and tasks within various business domains.
Text Generation Applications ✍️: Suitable for content creation, customer support automation, long-form text analysis with Retrieval-Augmented Generation (RAG), and answering intricate queries.
Research and Development 🔬: An excellent tool for exploring innovative approaches in natural language processing (NLP) that leverage large context windows for enhanced understanding and reasoning.

Limitations and Recommendations

Fine-Tuning Recommended 🔧: While the iSA-02-Nano-Llama-3.2-1B has a 1,048,576-token context window, it is strongly recommended to fine-tune the model for specific tasks to achieve optimal performance and avoid token repetition.
Challenges with Large Contexts ⚡: Utilizing such large context windows may require significant computational resources and meticulous fine-tuning to maintain response quality.
Continuous Feedback 💬: Users are encouraged to report issues and suggest improvements to continuously enhance the model.

Simplified Explanation

Think of the model as a super reader and writer. 📖✍️

Context Window 🏞️: Imagine it as the number of pages in a book the model can read at once. With 1,048,576 tokens, it can "read" a massive chunk of information simultaneously, allowing for a deep understanding of the topic.
1 Billion Parameters 🧠: These are the "buttons" or "connectors" in the model's digital brain. The more parameters, the more details it can learn and understand. Even as a small model, it is optimized for performing complex reasoning, ensuring smart and coherent responses.

Initial Idea: Why We Are Doing This

The journey towards the iSA-02 series (with more to follow) began with an unexpected experiment in January 2024. By combining two datasets that were initially thought to be flawed and unusable, and guided by the belief that 'AI is so new that every approach is worth exploring', we stumbled upon the first signs of reasoning abilities in a base model we were testing.

This discovery allowed us to unlock hidden insights and behaviors within the models by tapping into the already existing, but previously hidden, reasoning capabilities. We leveraged the model itself to guide us, allowing it to reflect on its own process. From there, we pushed the boundaries, generating new data that led to more extrapolated and refined outcomes.

Contributions and Feedback

The NeuraLake synthetic data platform was the foundation for creating this model, and we are open to questions, suggestions, and collaborations. If you have feedback or want to contribute to the development and improvement of the iSA-02-Nano-Llama-3.2-1B, feel free to leave a comment in the community tab.
Your feedback is essential for us to evolve and reach an even more robust final version! 🚀

License

This model is distributed under the Apache-2.0 license.

Ethical Considerations

While the iSA-02-Nano-Llama-3.2-1B is optimized for advanced reasoning tasks, users should be aware of potential biases present in the training data. We recommend thorough evaluation and fine-tuning to mitigate unintended biases and ensure fair and ethical use of the model.

Frequently Asked Questions (FAQ)

Q1: How does the extended context window benefit text generation tasks?
A: The extended context window allows the model to maintain coherence and context over much longer passages of text and reasoning, performing better for tasks that require understanding and generating large documents, compared to the base standard base model.

Q2: What computational resources are required to run the iSA-02-Nano-Llama-3.2-1B?
A: Due to its large context window, running the model efficiently requires significant memory and processing power. We recommend using GPUs with ample VRAM and optimized configurations for optimal performance. Using vLLM and setting max_model_len to 100.000 tokens, it uses between 9GB to 12GB of vRAM.

Got it! Here’s the updated format for the Hugging Face (HF) model card:

Q3: Can the model be fine-tuned on proprietary datasets?

A: Yes, the model is designed to be fine-tuned on specific datasets to tailor its performance to particular applications or domains. Add this to your dataset, as the model uses structural tags to guide reasoning:

<User_Prompt>  
User prompt 
</User_Prompt>  
<Reasoning>  
The model chain of thought 
</Reasoning>  
<Answer>  
Here is the final answer 
</Answer>

NeuraLake will provide a comprehensive guide on how to fine-tune the model, along with a small sample dataset available under the MIT license.

Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B")
model = AutoModelForCausalLM.from_pretrained("NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B")

input_text = "Explain the significance of the extended context window in modern NLP models."
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**inputs, max_length=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

OpenAi Compatible API:

from openai import OpenAI

client = OpenAI(
    api_key="any",
    base_url="http://localhost:8000/v1"
)
prompt = input("Prompt: ")
completion = client.chat.completions.create(
    model="NeuraLakeAi/iSA-02-Nano-Llama-3.2-1B",
    messages=[
        {"role": "system", "content": " "},
        {"role": "user", "content": prompt}
    ],
    stream=True,
    max_tokens = 90000,
)

for chunk in completion:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print() # Added a line break to the end of the answer

References

** Card Under development**

NeuraLakeAi
/

iSA-02-Nano-Llama-3.2-1B