Update README.md
Browse files
README.md
CHANGED
@@ -2,5 +2,141 @@
|
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
- meta-llama/Llama-3.2-3B
|
|
|
|
|
|
|
|
|
|
|
5 |
library_name: transformers
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
base_model:
|
4 |
- meta-llama/Llama-3.2-3B
|
5 |
+
tags:
|
6 |
+
- llama-3.2
|
7 |
+
- thought-chain
|
8 |
+
- instruction-finetuning
|
9 |
+
- transformers
|
10 |
library_name: transformers
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
---
|
13 |
+
|
14 |
+
# Thought-Ranked Llama 3.2 3B
|
15 |
+
|
16 |
+
## Model Description
|
17 |
+
|
18 |
+
This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a novel thought-chain ranking approach.
|
19 |
+
|
20 |
+
### Training Process
|
21 |
+
|
22 |
+
1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens.
|
23 |
+
|
24 |
+
2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens.
|
25 |
+
|
26 |
+
3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.
|
27 |
+
|
28 |
+
4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.
|
29 |
+
|
30 |
+
### Key Features
|
31 |
+
|
32 |
+
- **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers
|
33 |
+
- **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers
|
34 |
+
- **Length Parameters**:
|
35 |
+
- Thought chains: Up to 128 tokens
|
36 |
+
- Final answers: Up to 2048 tokens
|
37 |
+
|
38 |
+
### Model Architecture
|
39 |
+
|
40 |
+
- Base model: Llama 3.2 3B (Base)
|
41 |
+
- Architecture: Transformer-based language model
|
42 |
+
- Parameters: ~3.2 billion
|
43 |
+
- Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking
|
44 |
+
|
45 |
+
## Intended Use
|
46 |
+
|
47 |
+
This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:
|
48 |
+
- Problem-solving
|
49 |
+
- Mathematical reasoning
|
50 |
+
- Logical deduction
|
51 |
+
- Step-by-step explanations
|
52 |
+
- Complex decision making
|
53 |
+
|
54 |
+
### Out-of-Scope Uses
|
55 |
+
|
56 |
+
- Direct deployment without safety measures
|
57 |
+
- Applications requiring guaranteed accuracy
|
58 |
+
- Critical decision-making without human oversight
|
59 |
+
- Tasks requiring capabilities beyond the base Llama 3.2 3B model
|
60 |
+
|
61 |
+
## Training Details
|
62 |
+
|
63 |
+
### Training Data
|
64 |
+
|
65 |
+
The model was trained using:
|
66 |
+
- Sample questions paired with multiple thought variations
|
67 |
+
- Thought chains generated using systematic character prefixes
|
68 |
+
- Rankings derived from LLM evaluation of answer quality
|
69 |
+
|
70 |
+
### Training Procedure
|
71 |
+
|
72 |
+
1. **Thought Generation Phase**
|
73 |
+
- Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
|
74 |
+
- Sampled with temperature=0.0
|
75 |
+
- Maximum thought length: 128 tokens
|
76 |
+
|
77 |
+
2. **Answer Generation Phase**
|
78 |
+
- Generated completions following each thought chain
|
79 |
+
- Maximum answer length: 2048 tokens
|
80 |
+
- Sampled with temperature=0.0
|
81 |
+
|
82 |
+
3. **Ranking Phase**
|
83 |
+
- External LLM evaluated answer quality
|
84 |
+
- Ranking performed without access to thought chains
|
85 |
+
- Selected highest-performing thought-answer pairs
|
86 |
+
|
87 |
+
4. **Final Training Phase**
|
88 |
+
- Fine-tuned on best-performing thought-answer combinations
|
89 |
+
- 4 complete rounds of training
|
90 |
+
|
91 |
+
## Usage
|
92 |
+
|
93 |
+
```python
|
94 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
95 |
+
|
96 |
+
model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
|
97 |
+
tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")
|
98 |
+
|
99 |
+
# Example usage
|
100 |
+
prompt = "Solve this math problem: 2x + 3 = 7"
|
101 |
+
input_ids = tokenizer.apply_chat_template(
|
102 |
+
[{"role": "user", "content": prompt}],
|
103 |
+
return_tensors="pt"
|
104 |
+
)
|
105 |
+
|
106 |
+
# Generate response with thought chain
|
107 |
+
output = model.generate(
|
108 |
+
input_ids,
|
109 |
+
temperature=1.0,
|
110 |
+
)
|
111 |
+
|
112 |
+
response = tokenizer.decode(output[0])
|
113 |
+
```
|
114 |
+
|
115 |
+
## Limitations
|
116 |
+
|
117 |
+
- Limited to the capabilities of the base Llama 3.2 3B model
|
118 |
+
- May generate thought chains that are not always optimal
|
119 |
+
- Performance depends on the quality of the LLM ranking system used during training
|
120 |
+
- Training process may not capture all possible effective thought patterns
|
121 |
+
- Limited by the context window of the base model
|
122 |
+
|
123 |
+
## Ethical Considerations
|
124 |
+
|
125 |
+
- The model inherits biases from the base Llama 3.2 3B model
|
126 |
+
- Generated thought chains should be reviewed for accuracy and appropriateness
|
127 |
+
- The model's reasoning process should not be relied upon for critical decisions without human verification
|
128 |
+
- Users should implement appropriate content filtering and safety measures
|
129 |
+
|
130 |
+
## Citation
|
131 |
+
|
132 |
+
If you use this model in your research, please cite:
|
133 |
+
|
134 |
+
```bibtex
|
135 |
+
@misc{thought-ranked-llama,
|
136 |
+
title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
|
137 |
+
author={[Eric Florenzano]},
|
138 |
+
year={2024},
|
139 |
+
publisher={GitHub},
|
140 |
+
howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
|
141 |
+
}
|
142 |
+
```
|