TamasSimonds
/

O1-Llama-3.2-3B

Model card Files Files and versions Community

TamasSimonds commited on Nov 28, 2024

Commit

8348875

·

verified ·

1 Parent(s): cd5ac9b

Update README.md

Files changed (1) hide show

README.md +37 -28

README.md CHANGED Viewed

@@ -1,28 +1,37 @@
----
-library_name: transformers
-tags: []
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** Toby Simonds
-- **Model type:** Llama 3.2 3B finetune
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/tamassimonds/REL/tree/main
-- **Paper [optional]:** [More Information Needed]

+# O1-Llama 3.2 3B Model Card
+## Important Disclaimer
+This is a **proof-of-concept research model** designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.
+## Model Overview
+O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.
+## Key Capabilities
+- Explicit brainstorming and strategy enumeration
+- Step-by-step solution working out
+- Self-correction attempts
+- Verification steps in problem-solving
+## Limitations
+- Significantly lower performance compared to larger models
+- Can get stuck in circular reasoning
+- May fail to find correct solutions despite showing reasoning behavior
+- Limited to simpler problems
+- Not suitable for production use or critical applications
+## Training
+- Base Model: Llama 3.2 3B
+- Dataset: ReasonSet (2,000 worked solutions)
+- Domains: AIME, GPQA, MATH dataset problems
+- Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)
+## Intended Use
+- Research into reasoning capabilities of smaller language models
+- Study of explicit problem-solving behaviors
+- Academic investigation of model training methodologies
+## Repository
+Available at: https://github.com/tamassimonds/REL
+## Citation
+[Include paper citation when published]