|
# O1-Llama 3.2 3B Model Card |
|
|
|
## Important Disclaimer |
|
This is a **proof-of-concept research model** designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes. |
|
|
|
## Model Overview |
|
O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving. |
|
|
|
## Key Capabilities |
|
- Explicit brainstorming and strategy enumeration |
|
- Step-by-step solution working out |
|
- Self-correction attempts |
|
- Verification steps in problem-solving |
|
|
|
## Limitations |
|
- Significantly lower performance compared to larger models |
|
- Can get stuck in circular reasoning |
|
- May fail to find correct solutions despite showing reasoning behavior |
|
- Limited to simpler problems |
|
- Not suitable for production use or critical applications |
|
|
|
## Training |
|
- Base Model: Llama 3.2 3B |
|
- Dataset: ReasonSet (2,000 worked solutions) |
|
- Domains: AIME, GPQA, MATH dataset problems |
|
- Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop) |
|
|
|
## Intended Use |
|
- Research into reasoning capabilities of smaller language models |
|
- Study of explicit problem-solving behaviors |
|
- Academic investigation of model training methodologies |
|
|
|
## Repository |
|
Available at: https://github.com/tamassimonds/REL |
|
|
|
## Citation |
|
[Include paper citation when published] |