File size: 10,913 Bytes
da00d1b 76512ea da00d1b 76474f3 da00d1b 76474f3 da00d1b 9a22a84 cc43dbf 75c0e8f 0e9365b cc43dbf 9a22a84 76512ea 75c0e8f da00d1b f065a7f da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 76512ea da00d1b 0e9365b da00d1b 76512ea da00d1b 76512ea da00d1b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
---
language:
- en
license: mit
library_name: transformers
tags:
- financial-analysis
- conversational
- finance
- qlora
- financial-advice
- text-generation
- peft
- lora
- adapter
inference: false
model-index:
- name: FinSight AI
results:
- task:
type: text-generation
name: Financial Advisory Generation
dataset:
type: custom
name: Financial Conversations
metrics:
- type: rouge1
value: 12.57%
name: ROUGE-1 Improvement
- type: rouge2
value: 79.48%
name: ROUGE-2 Improvement
- type: rougeL
value: 24.00%
name: ROUGE-L Improvement
- type: bleu
value: 135.36%
name: BLEU Improvement
base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
---
<div align="center">
<h1>FinSight AI - Financial Advisory Chatbot</h1>
<p>A fine-tuned version of SmolLM2-1.7B optimized for financial advice and discussion.</p>
</div>
<div align="center">
<a href="https://pytorch.org/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white" alt="PyTorch"></a>
<a href="https://huggingface.co/transformers/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Transformers-FFAE33?style=for-the-badge&logoColor=white" alt="Transformers"></a>
<a href="https://huggingface.co/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-0050C5?style=for-the-badge&logoColor=white" alt="Hugging Face"></a>
<a href="https://github.com/microsoft/LoRA" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/LoRA-2088FF?style=for-the-badge&logo=github&logoColor=white" alt="LoRA"></a>
<a href="https://github.com/TimDettmers/bitsandbytes" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/BitsAndBytes-4D4D4D?style=for-the-badge&logo=github&logoColor=white" alt="BitsAndBytes"></a>
</div>
<div align="center">
<h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">Read Model Paper 📄</a></h3>
</div>
## Model Details
- **Base Model**: HuggingFaceTB/SmolLM2-1.7B-Instruct
- **Task**: Financial Advisory and Discussion
- **Training Data**: Curated dataset of ~11,000 financial conversations (~16.5M tokens)
- **Training Method**: QLoRA (4-bit quantization with LoRA)
- **Language**: English
- **License**: MIT
Check out training repo here: [Finsight AI](https://github.com/zahemen9900/FinsightAI.git)
## Model Description
FinSight AI is a specialized financial advisory assistant built by fine-tuning SmolLM2-1.7B-Instruct using QLoRA (Quantized Low-Rank Adaptation). The model has been trained on a comprehensive dataset of financial conversations to provide accurate, concise, and helpful information across various financial domains including personal finance, investing, market analysis, and financial planning.
Our evaluation demonstrates significant performance improvements across all standard NLP metrics **(ROUGE-1 , ROUGE-2, ROUGE-L & BLEU)**, showcasing the effectiveness of our domain-specific training approach. The model exhibits enhanced capabilities with richer financial terminology usage, more precise responses, improved handling of numerical data, and greater technical accuracy - all while maintaining a compact, resource-efficient architecture suitable for deployment on consumer hardware.
## Usage
### Streaming function
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer
import torch
from peft import PeftModel
import threading
# For 4-bit quantized inference (recommended)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# First load the base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM2-1.7B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
# Then load the adapter weights (LoRA)
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
device = 'cuda' if torch.cuda.is_available() else 'cpu'
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
prompt = "What's your name, and what're you good at?"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
formatted_prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Tokenize the formatted prompt
inputs = tokenizer(formatted_prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()} # Move all tensors to device
# Create a streamer
streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True)
# Adjust generation parameters for more controlled responses
generation_config = {
"max_new_tokens": 256,
"temperature": 0.6,
"top_p": 0.95,
"do_sample": True,
"pad_token_id": tokenizer.eos_token_id,
"eos_token_id": tokenizer.eos_token_id,
"repetition_penalty": 1.2,
"no_repeat_ngram_size": 4,
"num_beams": 1,
"early_stopping": False,
"length_penalty": 1.0,
}
# Combine inputs and generation config for the generate function
generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer}
# Start generation in a separate thread
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
# Iterate over the generated text
print("Response: ", end="")
for text in streamer:
print(text, end="", flush=True)
```
### Simple Non-Streaming Usage
If you prefer a simpler approach without streaming:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import PeftModel
# For 4-bit quantized inference
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# Load base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM2-1.7B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
# Load adapter weights (LoRA)
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
# Prepare input
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
user_prompt = "What's a good strategy for long-term investing?"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
formatted_prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
inputs.input_ids,
max_new_tokens=256,
temperature=0.7,
top_p=0.95,
do_sample=True,
repetition_penalty=1.2
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Response:\n", response.strip())
```
## Training Details
The model was trained using the following configuration:
- **QLoRA Parameters**:
- Rank (r): 64
- Alpha: 16
- Target modules: Query, Key, Value projections, MLP layers
- 4-bit NF4 quantization with double quantization
- **Training Hyperparameters**:
- Learning rate: 2e-4
- Epochs: 2
- Batch size: 2 (with gradient accumulation steps of 4)
- Weight decay: 0.05
- Scheduler: Cosine with restarts
- Warmup ratio: 0.15
- **Hardware**: Consumer-grade NVIDIA RTX 3050 GPU with 6GB VRAM
#### **More details can be found in the paper linked above.**
## Limitations
- **Information Currency**: Financial data and knowledge within the model is limited to the training data cutoff date. Market conditions, regulations, and financial instruments may have changed since then.
- **No Real-time Information**: The model operates without internet connectivity and cannot access current market data, breaking news, or recent economic developments.
- **Not Financial Advice**: Responses should not be considered personalized financial advice. The model cannot account for individual financial situations, risk tolerances, or specific circumstances required for proper financial planning.
- **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
- **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
- **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
- **Size of Dataset**: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics.
## Future Improvements
- **Retrieval Augmented Generation (RAG)**: Implementing RAG would allow the model to reference current financial data, market statistics, and regulatory information before generating responses, significantly improving accuracy and relevance.
- **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
- **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
- **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
- **A larger, higher quality dataset**: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model
## Citation
If you use FinSight AI in your research, please cite:
```md
@misc{FinSightAI2025,
author = {Zahemen, FinsightAI Team},
title = {FinSight AI: Enhancing Financial Domain Performance of Small Language Models Through QLoRA Fine-tuning},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/zahemen9900/FinsightAI}}
}
``` |