TamasSimonds commited on
Commit
8348875
·
verified ·
1 Parent(s): cd5ac9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -28
README.md CHANGED
@@ -1,28 +1,37 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** Toby Simonds
21
- - **Model type:** Llama 3.2 3B finetune
22
-
23
- ### Model Sources [optional]
24
-
25
- <!-- Provide the basic links for the model. -->
26
-
27
- - **Repository:** https://github.com/tamassimonds/REL/tree/main
28
- - **Paper [optional]:** [More Information Needed]
 
 
 
 
 
 
 
 
 
 
1
+ # O1-Llama 3.2 3B Model Card
2
+
3
+ ## Important Disclaimer
4
+ This is a **proof-of-concept research model** designed to demonstrate the feasibility of inducing structured reasoning behaviors in smaller language models. It is not intended for production use or deployment in real-world applications. The model serves primarily as a demonstration of training methodology and should be used only for research purposes.
5
+
6
+ ## Model Overview
7
+ O1-Llama 3.2 3B is a fine-tuned version of Llama 3.2 3B, trained to demonstrate explicit reasoning patterns similar to those observed in OpenAI's O1 model. The model is trained on ReasonSet, a dataset of worked solutions focusing on mathematical and logical problem-solving.
8
+
9
+ ## Key Capabilities
10
+ - Explicit brainstorming and strategy enumeration
11
+ - Step-by-step solution working out
12
+ - Self-correction attempts
13
+ - Verification steps in problem-solving
14
+
15
+ ## Limitations
16
+ - Significantly lower performance compared to larger models
17
+ - Can get stuck in circular reasoning
18
+ - May fail to find correct solutions despite showing reasoning behavior
19
+ - Limited to simpler problems
20
+ - Not suitable for production use or critical applications
21
+
22
+ ## Training
23
+ - Base Model: Llama 3.2 3B
24
+ - Dataset: ReasonSet (2,000 worked solutions)
25
+ - Domains: AIME, GPQA, MATH dataset problems
26
+ - Method: Fine-tuning on worked solutions generated through REL (Reasoning Enhancement Loop)
27
+
28
+ ## Intended Use
29
+ - Research into reasoning capabilities of smaller language models
30
+ - Study of explicit problem-solving behaviors
31
+ - Academic investigation of model training methodologies
32
+
33
+ ## Repository
34
+ Available at: https://github.com/tamassimonds/REL
35
+
36
+ ## Citation
37
+ [Include paper citation when published]