--- license: mit datasets: - ServiceNow-AI/R1-Distill-SFT language: - en base_model: - meta-llama/Llama-3.2-3B-Instruct pipeline_tag: text-generation library_name: transformers tags: - reasoning - axolotl - r1 --- # DeepSeek-R1-Distill-Llama-3B This model is the distilled version of DeepSeek-R1 on Llama-3.2-3B with R1-Distill-SFT dataset. [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

```yaml base_model: unsloth/Llama-3.2-3B-Instruct model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: true load_in_4bit: false strict: false chat_template: llama3 datasets: - path: ./custom_dataset.json type: chat_template conversation: chatml ds_type: json add_bos_token: true add_eos_token: true use_default_system_prompt: false special_tokens: bos_token: "<|begin_of_text|>" eos_token: "<|eot_id|>" pad_token: "<|eot_id|>" additional_special_tokens: - "<|begin_of_text|>" - "<|eot_id|>" adapter: lora lora_model_dir: lora_r: 16 lora_alpha: 32 lora_dropout: 0.1 lora_target_linear: true hub_model_id: suayptalha/DeepSeek-R1-Distill-Llama-3B sequence_len: 2048 sample_packing: false pad_to_sequence_len: true micro_batch_size: 2 gradient_accumulation_steps: 8 num_epochs: 1 learning_rate: 2e-5 optimizer: paged_adamw_8bit lr_scheduler: cosine train_on_inputs: false group_by_length: false bf16: false fp16: true tf32: false gradient_checkpointing: true flash_attention: false logging_steps: 50 warmup_steps: 100 saves_per_epoch: 1 output_dir: ./finetune-sft-results save_safetensors: true ```

# Prompt Template You can use Llama3 prompt template while using the model: ### Llama3 ``` <|start_header_id|>system<|end_header_id|> {system}<|eot_id|> <|start_header_id|>user<|end_header_id|> {user}<|eot_id|> <|start_header_id|>assistant<|end_header_id|> {assistant}<|eot_id|> ``` ## Example usage: ```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "suayptalha/DeepSeek-R1-Distill-Llama-3B", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("suayptalha/DeepSeek-R1-Distill-Llama-3B") SYSTEM_PROMPT = """Respond in the following format: You should reason between these tags. Answer goes here... Always use tags even if they are not necessary. """ messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": "Which one is larger? 9.11 or 9.9?"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt", ).to("cuda") output = model.generate(input_ids=inputs, max_new_tokens=256, use_cache=True, temperature=0.7) decoded_output = tokenizer.decode(output[0], skip_special_tokens=False) print(decoded_output) ``` ## Output: ``` First, I need to compare the two numbers 9.11 and 9.9. Next, I'll analyze each number. The first digit after the decimal point in 9.11 is 1, and in 9.9, it's 9. Since 9 is greater than 1, 9.9 is larger than 9.11. To determine which number is larger, let's compare the two numbers: **9.11** and **9.9** 1. **Identify the Decimal Places:** - Both numbers have two decimal places. 2. **Compare the Tens Place (Right of the Decimal Point):** - **9.11:** The tens place is 1. - **9.9:** The tens place is 9. 3. **Conclusion:** - Since 9 is greater than 1, the number with the larger tens place is 9.9. **Answer:** **9.9** is larger than **9.11**. ``` ## Suggested system prompt: ``` Respond in the following format: You should reason between these tags. Answer goes here... Always use tags even if they are not necessary. ``` # Parameters - lr: 2e-5 - epochs: 1 - batch_size: 16 - optimizer: paged_adamw_8bit # Support