Llama-3.2-3B-COT / README.md
ericflo's picture
Update README.md
e8d5d89 verified
---
license: apache-2.0
base_model:
- meta-llama/Llama-3.2-3B
tags:
- llama-3.2
- thought-chain
- instruction-finetuning
- transformers
library_name: transformers
pipeline_tag: text-generation
---
# Thought-Ranked Llama 3.2 3B
## Model Description
This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach.
(Weekend project, just a few hundred steps of training)
### Training Process
1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens.
2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens.
3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.
4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.
### Key Features
- **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers
- **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers
- **Length Parameters**:
- Thought chains: Up to 128 tokens
- Final answers: Up to 2048 tokens
### Model Architecture
- Base model: Llama 3.2 3B (Base)
- Architecture: Transformer-based language model
- Parameters: ~3.2 billion
- Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking
## Intended Use
This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:
- Problem-solving
- Mathematical reasoning
- Logical deduction
- Step-by-step explanations
- Complex decision making
### Out-of-Scope Uses
- Direct deployment without safety measures
- Applications requiring guaranteed accuracy
- Critical decision-making without human oversight
- Tasks requiring capabilities beyond the base Llama 3.2 3B model
## Training Details
### Training Data
The model was trained using:
- Sample questions paired with multiple thought variations
- Thought chains generated using systematic character prefixes
- Rankings derived from LLM evaluation of answer quality
### Training Procedure
1. **Thought Generation Phase**
- Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
- Sampled with temperature=0.0
- Maximum thought length: 128 tokens
2. **Answer Generation Phase**
- Generated completions following each thought chain
- Maximum answer length: 2048 tokens
- Sampled with temperature=0.0
3. **Ranking Phase**
- External LLM evaluated answer quality
- Ranking performed without access to thought chains
- Selected highest-performing thought-answer pairs
4. **Final Training Phase**
- Fine-tuned on best-performing thought-answer combinations
- 4 complete rounds of training
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")
# Example usage
prompt = "Solve this math problem: 2x + 3 = 7"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
return_tensors="pt"
)
# Generate response with thought chain
output = model.generate(
input_ids,
temperature=1.0,
)
response = tokenizer.decode(output[0])
```
## Limitations
- Limited to the capabilities of the base Llama 3.2 3B model
- May generate thought chains that are not always optimal
- Performance depends on the quality of the LLM ranking system used during training
- Training process may not capture all possible effective thought patterns
- Limited by the context window of the base model
## Ethical Considerations
- The model inherits biases from the base Llama 3.2 3B model
- Generated thought chains should be reviewed for accuracy and appropriateness
- The model's reasoning process should not be relied upon for critical decisions without human verification
- Users should implement appropriate content filtering and safety measures
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{thought-ranked-llama,
title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
author={[Eric Florenzano]},
year={2024},
howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
}
```