# O1-Llama 3.2 3B Model Card ## Important Disclaimer This is a **proof-of-concept research model** designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes. ## Model Overview O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving. ## Key Capabilities - Explicit brainstorming and strategy enumeration - Step-by-step solution working out - Self-correction attempts - Verification steps in problem-solving ## Limitations - Significantly lower performance compared to larger models - Can get stuck in circular reasoning - May fail to find correct solutions despite showing reasoning behavior - Limited to simpler problems - Not suitable for production use or critical applications ## Training - Base Model: Llama 3.2 3B - Dataset: ReasonSet (2,000 worked solutions) - Domains: AIME, GPQA, MATH dataset problems - Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop) ## Intended Use - Research into reasoning capabilities of smaller language models - Study of explicit problem-solving behaviors - Academic investigation of model training methodologies ## Repository Available at: https://github.com/tamassimonds/REL ## Citation [Include paper citation when published]