FinSight AI - Financial Advisory Chatbot

A fine-tuned version of SmolLM2-1.7B optimized for financial advice and discussion.

Read Model Paper 📄

Model Details

Base Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
Task: Financial Advisory and Discussion
Training Data: Curated dataset of ~~11,000 financial conversations (~~16.5M tokens)
Training Method: QLoRA (4-bit quantization with LoRA)
Language: English
License: MIT

Check out training repo here: Finsight AI

Model Description

FinSight AI is a specialized financial advisory assistant built by fine-tuning SmolLM2-1.7B-Instruct using QLoRA (Quantized Low-Rank Adaptation). The model has been trained on a comprehensive dataset of financial conversations to provide accurate, concise, and helpful information across various financial domains including personal finance, investing, market analysis, and financial planning.

Our evaluation demonstrates significant performance improvements across all standard NLP metrics (ROUGE-1 , ROUGE-2, ROUGE-L & BLEU), showcasing the effectiveness of our domain-specific training approach. The model exhibits enhanced capabilities with richer financial terminology usage, more precise responses, improved handling of numerical data, and greater technical accuracy - all while maintaining a compact, resource-efficient architecture suitable for deployment on consumer hardware.

Usage

Streaming function

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer
import torch
from peft import PeftModel
import threading

# For 4-bit quantized inference (recommended)
bnb_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_use_double_quant=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=torch.bfloat16
)

# First load the base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
  "HuggingFaceTB/SmolLM2-1.7B-Instruct",
  quantization_config=bnb_config,
  device_map="auto"
)

# Then load the adapter weights (LoRA)
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

device = 'cuda' if torch.cuda.is_available() else 'cpu'
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
prompt = "What's your name, and what're you good at?"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}
]

formatted_prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Tokenize the formatted prompt
inputs = tokenizer(formatted_prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}  # Move all tensors to device

# Create a streamer
streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True)

# Adjust generation parameters for more controlled responses
generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.6,
    "top_p": 0.95,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
    "eos_token_id": tokenizer.eos_token_id,
    "repetition_penalty": 1.2,
    "no_repeat_ngram_size": 4,
    "num_beams": 1,
    "early_stopping": False,
    "length_penalty": 1.0,
}

# Combine inputs and generation config for the generate function
generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer}

# Start generation in a separate thread
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

# Iterate over the generated text
print("Response: ", end="")
for text in streamer:
    print(text, end="", flush=True)

Simple Non-Streaming Usage

If you prefer a simpler approach without streaming:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import PeftModel

# For 4-bit quantized inference
bnb_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_use_double_quant=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_compute_dtype=torch.bfloat16
)

# Load base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
  "HuggingFaceTB/SmolLM2-1.7B-Instruct",
  quantization_config=bnb_config,
  device_map="auto"
)

# Load adapter weights (LoRA)
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

# Prepare input
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
user_prompt = "What's a good strategy for long-term investing?"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

formatted_prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.95,
    do_sample=True,
    repetition_penalty=1.2
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Response:\n", response.strip())

Training Details

The model was trained using the following configuration:

QLoRA Parameters:
- Rank (r): 64
- Alpha: 16
- Target modules: Query, Key, Value projections, MLP layers
- 4-bit NF4 quantization with double quantization
Training Hyperparameters:
- Learning rate: 2e-4
- Epochs: 2
- Batch size: 2 (with gradient accumulation steps of 4)
- Weight decay: 0.05
- Scheduler: Cosine with restarts
- Warmup ratio: 0.15
Hardware: Consumer-grade NVIDIA RTX 3050 GPU with 6GB VRAM

More details can be found in the paper linked above.

Limitations

Information Currency: Financial data and knowledge within the model is limited to the training data cutoff date. Market conditions, regulations, and financial instruments may have changed since then.
No Real-time Information: The model operates without internet connectivity and cannot access current market data, breaking news, or recent economic developments.
Not Financial Advice: Responses should not be considered personalized financial advice. The model cannot account for individual financial situations, risk tolerances, or specific circumstances required for proper financial planning.
Language Limitations: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
Regulatory Compliance: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
Complexity Handling: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
Size of Dataset: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics.

Future Improvements

Retrieval Augmented Generation (RAG): Implementing RAG would allow the model to reference current financial data, market statistics, and regulatory information before generating responses, significantly improving accuracy and relevance.
Domain-Specific Fine-tuning: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
Multilingual Support: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
Personalization Framework: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
A larger, higher quality dataset: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model

Citation

If you use FinSight AI in your research, please cite:


@misc{FinSightAI2025,
  author = {Zahemen, FinsightAI Team},
  title = {FinSight AI: Enhancing Financial Domain Performance of Small Language Models Through QLoRA Fine-tuning},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zahemen9900/FinsightAI}}
}

zahemen9900
/

finsight-ai