Fino1-8B Quantized Models

This repository contains Q4_KM and Q5_KM quantized versions of TheFinAI/Fino1-8B, a financial reasoning model based on Llama 3.1 8B Instruct. These quantized variants maintain the model's financial reasoning capabilities while providing significant memory and speed improvements.

Discover our full range of quantized language models by visiting our SandLogic Lexicon HuggingFace. To learn more about our company and services, check out our website at SandLogic.

Model Details

Base Information

Original Model: Fino1-8B
Quantized Versions:
- Q4_KM (4-bit quantization)
- Q5_KM (5-bit quantization)
Base Architecture: Llama 3.1 8B Instruct
Primary Focus: Financial reasoning tasks
Paper: arxiv.org/abs/2502.08127

💰 Financial Capabilities

Both quantized versions maintain the original model's strengths in:

Financial mathematical reasoning
Structured financial question answering
FinQA dataset-based problems
Step-by-step financial calculations
Financial document analysis

Quantization Benefits

Q4_KM Version

Model size: 4.92 GB (75% reduction)
Optimal for resource-constrained environments
Faster inference speed
Suitable for rapid financial calculations

Q5_KM Version

Model size: 5.73 GB (69% reduction)
Better quality preservation
Balanced performance-size trade-off
Recommended for precision-critical financial applications

🚀 Usage

pip install llama-cpp-python

Please refer to the llama-cpp-python documentation to install with GPU support.

from llama_cpp import Llama

llm = Llama(
    model_path="model/path/",
    verbose=False,
    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
    # n_ctx=2048, # Uncomment to increase the context window
)

# Example of a reasoning task
output = llm(
    """Q: A company's revenue grew from $100,000 to $150,000 in one year. 
Calculate the percentage growth rate. A: """,
    max_tokens=256,
    stop=["Q:", "\n\n"],
    echo=False
)

print(output["choices"][0]["text"])

Training Details

Original Model Training

Dataset: TheFinAI/Fino1_Reasoning_Path_FinQA
Methods: SFT (Supervised Fine-Tuning) and RF
Hardware: 4xH100 GPUs
Configuration:
- Batch Size: 16
- Learning Rate: 2e-5
- Epochs: 3
- Optimizer: AdamW

SandLogicTechnologies
/

Fino1-8B-GGUF