ericflo commited on
Commit
e89d270
·
verified ·
1 Parent(s): 0b8f964

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -1
README.md CHANGED
@@ -2,5 +2,141 @@
2
  license: apache-2.0
3
  base_model:
4
  - meta-llama/Llama-3.2-3B
 
 
 
 
 
5
  library_name: transformers
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model:
4
  - meta-llama/Llama-3.2-3B
5
+ tags:
6
+ - llama-3.2
7
+ - thought-chain
8
+ - instruction-finetuning
9
+ - transformers
10
  library_name: transformers
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Thought-Ranked Llama 3.2 3B
15
+
16
+ ## Model Description
17
+
18
+ This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a novel thought-chain ranking approach.
19
+
20
+ ### Training Process
21
+
22
+ 1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens.
23
+
24
+ 2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens.
25
+
26
+ 3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.
27
+
28
+ 4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.
29
+
30
+ ### Key Features
31
+
32
+ - **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers
33
+ - **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers
34
+ - **Length Parameters**:
35
+ - Thought chains: Up to 128 tokens
36
+ - Final answers: Up to 2048 tokens
37
+
38
+ ### Model Architecture
39
+
40
+ - Base model: Llama 3.2 3B (Base)
41
+ - Architecture: Transformer-based language model
42
+ - Parameters: ~3.2 billion
43
+ - Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking
44
+
45
+ ## Intended Use
46
+
47
+ This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:
48
+ - Problem-solving
49
+ - Mathematical reasoning
50
+ - Logical deduction
51
+ - Step-by-step explanations
52
+ - Complex decision making
53
+
54
+ ### Out-of-Scope Uses
55
+
56
+ - Direct deployment without safety measures
57
+ - Applications requiring guaranteed accuracy
58
+ - Critical decision-making without human oversight
59
+ - Tasks requiring capabilities beyond the base Llama 3.2 3B model
60
+
61
+ ## Training Details
62
+
63
+ ### Training Data
64
+
65
+ The model was trained using:
66
+ - Sample questions paired with multiple thought variations
67
+ - Thought chains generated using systematic character prefixes
68
+ - Rankings derived from LLM evaluation of answer quality
69
+
70
+ ### Training Procedure
71
+
72
+ 1. **Thought Generation Phase**
73
+ - Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
74
+ - Sampled with temperature=0.0
75
+ - Maximum thought length: 128 tokens
76
+
77
+ 2. **Answer Generation Phase**
78
+ - Generated completions following each thought chain
79
+ - Maximum answer length: 2048 tokens
80
+ - Sampled with temperature=0.0
81
+
82
+ 3. **Ranking Phase**
83
+ - External LLM evaluated answer quality
84
+ - Ranking performed without access to thought chains
85
+ - Selected highest-performing thought-answer pairs
86
+
87
+ 4. **Final Training Phase**
88
+ - Fine-tuned on best-performing thought-answer combinations
89
+ - 4 complete rounds of training
90
+
91
+ ## Usage
92
+
93
+ ```python
94
+ from transformers import AutoModelForCausalLM, AutoTokenizer
95
+
96
+ model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
97
+ tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")
98
+
99
+ # Example usage
100
+ prompt = "Solve this math problem: 2x + 3 = 7"
101
+ input_ids = tokenizer.apply_chat_template(
102
+ [{"role": "user", "content": prompt}],
103
+ return_tensors="pt"
104
+ )
105
+
106
+ # Generate response with thought chain
107
+ output = model.generate(
108
+ input_ids,
109
+ temperature=1.0,
110
+ )
111
+
112
+ response = tokenizer.decode(output[0])
113
+ ```
114
+
115
+ ## Limitations
116
+
117
+ - Limited to the capabilities of the base Llama 3.2 3B model
118
+ - May generate thought chains that are not always optimal
119
+ - Performance depends on the quality of the LLM ranking system used during training
120
+ - Training process may not capture all possible effective thought patterns
121
+ - Limited by the context window of the base model
122
+
123
+ ## Ethical Considerations
124
+
125
+ - The model inherits biases from the base Llama 3.2 3B model
126
+ - Generated thought chains should be reviewed for accuracy and appropriateness
127
+ - The model's reasoning process should not be relied upon for critical decisions without human verification
128
+ - Users should implement appropriate content filtering and safety measures
129
+
130
+ ## Citation
131
+
132
+ If you use this model in your research, please cite:
133
+
134
+ ```bibtex
135
+ @misc{thought-ranked-llama,
136
+ title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
137
+ author={[Eric Florenzano]},
138
+ year={2024},
139
+ publisher={GitHub},
140
+ howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
141
+ }
142
+ ```