|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- meta-llama/Llama-3.2-3B |
|
tags: |
|
- llama-3.2 |
|
- thought-chain |
|
- instruction-finetuning |
|
- transformers |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Thought-Ranked Llama 3.2 3B |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach. |
|
(Weekend project, just a few hundred steps of training) |
|
|
|
### Training Process |
|
|
|
1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens. |
|
|
|
2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens. |
|
|
|
3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns. |
|
|
|
4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously. |
|
|
|
### Key Features |
|
|
|
- **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers |
|
- **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers |
|
- **Length Parameters**: |
|
- Thought chains: Up to 128 tokens |
|
- Final answers: Up to 2048 tokens |
|
|
|
### Model Architecture |
|
|
|
- Base model: Llama 3.2 3B (Base) |
|
- Architecture: Transformer-based language model |
|
- Parameters: ~3.2 billion |
|
- Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking |
|
|
|
## Intended Use |
|
|
|
This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to: |
|
- Problem-solving |
|
- Mathematical reasoning |
|
- Logical deduction |
|
- Step-by-step explanations |
|
- Complex decision making |
|
|
|
### Out-of-Scope Uses |
|
|
|
- Direct deployment without safety measures |
|
- Applications requiring guaranteed accuracy |
|
- Critical decision-making without human oversight |
|
- Tasks requiring capabilities beyond the base Llama 3.2 3B model |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained using: |
|
- Sample questions paired with multiple thought variations |
|
- Thought chains generated using systematic character prefixes |
|
- Rankings derived from LLM evaluation of answer quality |
|
|
|
### Training Procedure |
|
|
|
1. **Thought Generation Phase** |
|
- Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9) |
|
- Sampled with temperature=0.0 |
|
- Maximum thought length: 128 tokens |
|
|
|
2. **Answer Generation Phase** |
|
- Generated completions following each thought chain |
|
- Maximum answer length: 2048 tokens |
|
- Sampled with temperature=0.0 |
|
|
|
3. **Ranking Phase** |
|
- External LLM evaluated answer quality |
|
- Ranking performed without access to thought chains |
|
- Selected highest-performing thought-answer pairs |
|
|
|
4. **Final Training Phase** |
|
- Fine-tuned on best-performing thought-answer combinations |
|
- 4 complete rounds of training |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT") |
|
tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT") |
|
|
|
# Example usage |
|
prompt = "Solve this math problem: 2x + 3 = 7" |
|
input_ids = tokenizer.apply_chat_template( |
|
[{"role": "user", "content": prompt}], |
|
return_tensors="pt" |
|
) |
|
|
|
# Generate response with thought chain |
|
output = model.generate( |
|
input_ids, |
|
temperature=1.0, |
|
) |
|
|
|
response = tokenizer.decode(output[0]) |
|
``` |
|
|
|
## Limitations |
|
|
|
- Limited to the capabilities of the base Llama 3.2 3B model |
|
- May generate thought chains that are not always optimal |
|
- Performance depends on the quality of the LLM ranking system used during training |
|
- Training process may not capture all possible effective thought patterns |
|
- Limited by the context window of the base model |
|
|
|
## Ethical Considerations |
|
|
|
- The model inherits biases from the base Llama 3.2 3B model |
|
- Generated thought chains should be reviewed for accuracy and appropriateness |
|
- The model's reasoning process should not be relied upon for critical decisions without human verification |
|
- Users should implement appropriate content filtering and safety measures |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{thought-ranked-llama, |
|
title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains}, |
|
author={[Eric Florenzano]}, |
|
year={2024}, |
|
howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}} |
|
} |
|
``` |