O1-Llama-3.2-3B / README.md
TamasSimonds's picture
Update README.md
8348875 verified
# O1-Llama 3.2 3B Model Card
## Important Disclaimer
This is a **proof-of-concept research model** designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.
## Model Overview
O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.
## Key Capabilities
- Explicit brainstorming and strategy enumeration
- Step-by-step solution working out
- Self-correction attempts
- Verification steps in problem-solving
## Limitations
- Significantly lower performance compared to larger models
- Can get stuck in circular reasoning
- May fail to find correct solutions despite showing reasoning behavior
- Limited to simpler problems
- Not suitable for production use or critical applications
## Training
- Base Model: Llama 3.2 3B
- Dataset: ReasonSet (2,000 worked solutions)
- Domains: AIME, GPQA, MATH dataset problems
- Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)
## Intended Use
- Research into reasoning capabilities of smaller language models
- Study of explicit problem-solving behaviors
- Academic investigation of model training methodologies
## Repository
Available at: https://github.com/tamassimonds/REL
## Citation
[Include paper citation when published]