O1-Llama 3.2 3B Model Card

Important Disclaimer

This is a proof-of-concept research model designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.

Model Overview

O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.

Key Capabilities

Explicit brainstorming and strategy enumeration
Step-by-step solution working out
Self-correction attempts
Verification steps in problem-solving

Limitations

Significantly lower performance compared to larger models
Can get stuck in circular reasoning
May fail to find correct solutions despite showing reasoning behavior
Limited to simpler problems
Not suitable for production use or critical applications

Training

Base Model: Llama 3.2 3B
Dataset: ReasonSet (2,000 worked solutions)
Domains: AIME, GPQA, MATH dataset problems
Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)

Intended Use

Research into reasoning capabilities of smaller language models
Study of explicit problem-solving behaviors
Academic investigation of model training methodologies

Repository

Available at: https://github.com/tamassimonds/REL

Citation

[Include paper citation when published]

TamasSimonds
/

O1-Llama-3.2-3B