TamasSimonds
/

O1-Llama-3.2-3B

Model card Files Files and versions Community

O1-Llama-3.2-3B / README.md

TamasSimonds's picture

Update README.md

8348875 verified 3 months ago

|

history blame contribute delete

1.68 kB

	# O1-Llama 3.2 3B Model Card

	## Important Disclaimer
	This is a proof-of-concept research model designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.

	## Model Overview
	O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.

	## Key Capabilities
	- Explicit brainstorming and strategy enumeration
	- Step-by-step solution working out
	- Self-correction attempts
	- Verification steps in problem-solving

	## Limitations
	- Significantly lower performance compared to larger models
	- Can get stuck in circular reasoning
	- May fail to find correct solutions despite showing reasoning behavior
	- Limited to simpler problems
	- Not suitable for production use or critical applications

	## Training
	- Base Model: Llama 3.2 3B
	- Dataset: ReasonSet (2,000 worked solutions)
	- Domains: AIME, GPQA, MATH dataset problems
	- Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)

	## Intended Use
	- Research into reasoning capabilities of smaller language models
	- Study of explicit problem-solving behaviors
	- Academic investigation of model training methodologies

	## Repository
	Available at: https://github.com/tamassimonds/REL

	## Citation
	[Include paper citation when published]