Llama-3.2-3B-COT / README.md

Update README.md

e8d5d89 verified 4 months ago

4.8 kB

	---
	license: apache-2.0
	base_model:
	- meta-llama/Llama-3.2-3B
	tags:
	- llama-3.2
	- thought-chain
	- instruction-finetuning
	- transformers
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Thought-Ranked Llama 3.2 3B

	## Model Description

	This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach.
	(Weekend project, just a few hundred steps of training)

	### Training Process

	1. Initial Generation: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens.

	2. Answer Generation: Following each thought chain, the model generates a complete answer with up to 2048 tokens.

	3. Ranking & Selection: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.

	4. Final Training: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.

	### Key Features

	- Thought Chain Generation: The model has learned to generate explicit thought processes before providing answers
	- Greedy Sampling: Uses greedy sampling for both thought generation and final answers
	- Length Parameters:
	- Thought chains: Up to 128 tokens
	- Final answers: Up to 2048 tokens

	### Model Architecture

	- Base model: Llama 3.2 3B (Base)
	- Architecture: Transformer-based language model
	- Parameters: ~3.2 billion
	- Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking

	## Intended Use

	This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:
	- Problem-solving
	- Mathematical reasoning
	- Logical deduction
	- Step-by-step explanations
	- Complex decision making

	### Out-of-Scope Uses

	- Direct deployment without safety measures
	- Applications requiring guaranteed accuracy
	- Critical decision-making without human oversight
	- Tasks requiring capabilities beyond the base Llama 3.2 3B model

	## Training Details

	### Training Data

	The model was trained using:
	- Sample questions paired with multiple thought variations
	- Thought chains generated using systematic character prefixes
	- Rankings derived from LLM evaluation of answer quality

	### Training Procedure

	1. Thought Generation Phase
	- Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
	- Sampled with temperature=0.0
	- Maximum thought length: 128 tokens

	2. Answer Generation Phase
	- Generated completions following each thought chain
	- Maximum answer length: 2048 tokens
	- Sampled with temperature=0.0

	3. Ranking Phase
	- External LLM evaluated answer quality
	- Ranking performed without access to thought chains
	- Selected highest-performing thought-answer pairs

	4. Final Training Phase
	- Fine-tuned on best-performing thought-answer combinations
	- 4 complete rounds of training

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
	tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")

	# Example usage
	prompt = "Solve this math problem: 2x + 3 = 7"
	input_ids = tokenizer.apply_chat_template(
	[{"role": "user", "content": prompt}],
	return_tensors="pt"
	)

	# Generate response with thought chain
	output = model.generate(
	input_ids,
	temperature=1.0,
	)

	response = tokenizer.decode(output[0])
	```

	## Limitations

	- Limited to the capabilities of the base Llama 3.2 3B model
	- May generate thought chains that are not always optimal
	- Performance depends on the quality of the LLM ranking system used during training
	- Training process may not capture all possible effective thought patterns
	- Limited by the context window of the base model

	## Ethical Considerations

	- The model inherits biases from the base Llama 3.2 3B model
	- Generated thought chains should be reviewed for accuracy and appropriateness
	- The model's reasoning process should not be relied upon for critical decisions without human verification
	- Users should implement appropriate content filtering and safety measures

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{thought-ranked-llama,
	title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
	author={[Eric Florenzano]},
	year={2024},
	howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
	}
	```