O1-Llama 3.2 3B Model Card
Important Disclaimer
This is a proof-of-concept research model designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.
Model Overview
O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.
Key Capabilities
- Explicit brainstorming and strategy enumeration
- Step-by-step solution working out
- Self-correction attempts
- Verification steps in problem-solving
Limitations
- Significantly lower performance compared to larger models
- Can get stuck in circular reasoning
- May fail to find correct solutions despite showing reasoning behavior
- Limited to simpler problems
- Not suitable for production use or critical applications
Training
- Base Model: Llama 3.2 3B
- Dataset: ReasonSet (2,000 worked solutions)
- Domains: AIME, GPQA, MATH dataset problems
- Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)
Intended Use
- Research into reasoning capabilities of smaller language models
- Study of explicit problem-solving behaviors
- Academic investigation of model training methodologies
Repository
Available at: https://github.com/tamassimonds/REL
Citation
[Include paper citation when published]