File size: 15,419 Bytes
0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 0b88607 0ccab1b 83efcf3 0ccab1b 83efcf3 74040d7 0b88607 74040d7 0b88607 74040d7 0ccab1b 74040d7 0ccab1b 74040d7 0ccab1b 74040d7 0ccab1b 74040d7 0ccab1b 74040d7 0ccab1b 74040d7 0ccab1b 0b88607 74040d7 0ccab1b 0b88607 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 0ccab1b 0b88607 83efcf3 0ccab1b 83efcf3 0ccab1b 83efcf3 1d8ba20 83efcf3 0ccab1b 1d8ba20 0ccab1b 1d8ba20 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
---
license: apache-2.0
datasets:
- HuggingFaceH4/Bespoke-Stratos-17k
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
base_model:
- meta-llama/Llama-3.2-1B-Instruct
---
# Model Card for Model ID
# π Introducing Llama-3.2-1B-Instruct-Open-R1-Distill
Built on **Llama-3.2-1B-Instruct** and Hugging Faceβs [OpenR1](https://github.com/huggingface/open-r1) β a fully open reproduction of **DeepSeek-R1** β this model brings powerful reasoning capabilities to compact, efficient architectures.
## π Why This Matters
I have always been passionate about pushing the boundaries of **LLM** technology in smaller models that can run seamlessly on laptop CPUs and smartphones.
With the recent breakthrough of **DeepSeek-R1**, developing a high-quality reasoning model through distillation has become remarkably straightforward. It requires only **supervised fine-tuning (SFT)** on a dataset generated by a teacher model.
Thanks to **Hugging Face**, we now have a streamlined framework to make this process more accessible than ever.
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** keeeeenw
- **Funded by [optional]:** myself for < $50 (renting compute for a few hours)
- **Model type:** Llama-3.2-1B-Instruct with reasoning capability
- **License:** Apache License 2.0
- **Finetuned from model [optional]:** Llama-3.2-1B-Instruct
## π― Uses
- π‘ **On-device AI assistants** for reasoning and general-purpose tasks
- π± **Mobile and edge AI applications** requiring lightweight models
- π€ **Chatbots and virtual assistants** optimized for efficiency
- π **Fine-tuning for specific domains** with SFT training
### How to run the code?
```python
import torch
import transformers
from transformers import TextStreamer
from transformers import AutoTokenizer, AutoModel, LlamaForCausalLM
device = 'cuda' # if you don't have a CUDA supported GPU, change this to 'cpu' or other supported device
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
# load model
model = LlamaForCausalLM.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
model.to(device)
# Setup the prompt. Because we instruction-tuned with a similar prompt, it is important to use this.
# Change "content" to your actual question.
messages = [
{
"role": "system",
"content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:",
},
{"role": "user", "content": "Please provide me instructions on how to steal an egg from my chicken?"},
]
formatted_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, return_tensors="pt")
print(formatted_chat)
# Setup input tokens
inputs = tokenizer(formatted_chat, return_tensors="pt", padding=True)
inputs = inputs.to(device)
attention_mask = inputs["attention_mask"]
# Run inference and stream the output
streamer = TextStreamer(tokenizer, skip_prompt=True)
outputs = model.generate(inputs['input_ids'],
streamer=streamer,
attention_mask=attention_mask,
pad_token_id=tokenizer.eos_token_id,
top_k=5,
top_p=0.9,
max_new_tokens=131072) # max supported by llama 3.2 1B
# Write output to a file
decoded_text = tokenizer.decode(outputs[0])
print("Output written to output.txt")
with open("output.txt", "w", encoding="utf-8") as f:
f.write(decoded_text)
```
### Sample Output
Please the full text: https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill/blob/main/sample_output_2.txt
```
Okay, so I need to figure out how to steal an egg from my chicken. Let's start by understanding the situation. I have a chicken, and I want to take an egg from it without the chicken noticing. Since chickens can be protective of their eggs, I need to be careful not to get caught.
First, I should consider the chicken's behavior. Chickens are naturally protective of their nests and eggs. If the chicken is aware of my presence near the coop, it might sound a warning call to alert others, which could mean I'm caught. So, maybe I need a strategy that doesn't involve direct interaction with the chicken.
One approach could be to use distraction. If I create a distraction elsewhere, like making noise or knocking things around, maybe the chicken is distracted and doesn't pay attention to me. But I need to make sure the distraction isn't too intense or long enough to be detected.
Another idea is to wait until the chicken is in a hurry. If I wait until it's leaving the coop or going to a specific location, I might have a better chance to grab the egg without it noticing. But I need to be cautious not to be seen myself.
I should also think about the physical barriers. If I can get to the egg without the chicken seeing me, that might work. Maybe I can use a tool to gently take the egg from the nesting box without disturbing the rest of the chicken.
Wait, but how do I know if the chicken has an egg in the first place? If I can't see the egg, how do I know it's there? Maybe I need to check the nesting box, but if I do, and the chicken sees me, it might chase me away, making the situation more difficult.
Hmm, this complicates things. If I need to check the egg without disturbing the chicken, perhaps I can do it quickly and quietly. But how do I do that without being detected?
I should also consider the chicken's behavior around eggs. Some chickens are more protective than others. Maybe there's a way to exploit that difference. For example, if the chicken is particularly aggressive when defending its eggs, I could take advantage of that.
Another thought: maybe I can use the chicken's own behavior against it. If I can make the chicken work harder to guard the egg, perhaps it will exhaust itself and leave me alone. But I need to be careful not to overdo it.
// ... a few moments later (added by me)
**Analysis of Each Option**
\- **Option 1: Direct Approach**
- **Risk:** High (get caught)
- **Potential Reward:** Possible (take the egg)
- **Steps:**
a. Approach the chicken while it's foraging.
b. Try to take the egg from the nesting box.
c. If caught, escape or handle the egg.
d. Repeat.
- **Potential Downsides:**
- The chicken might get aggressive if it realizes you're trying to take its egg.
- If you're caught, you could get hurt or the chicken could be upset.
- **Conclusion:** This approach is risky and may not be successful.
\- **Option 2: Indirect Approach**
- **Risk:** Medium (chicken might see you and chase)
- **Potential Reward:** Possible (take the egg)
- **Steps:**
a. Create a distraction (e.g., make noise, knock over a bucket).
b. The chicken will be distracted, reducing the likelihood of being seen.
c. Sneak up to the chicken and try to take the egg.
d. If safe, escape without being seen.
- **Potential Downsides:**
- If the distraction is too strong or the chicken reacts poorly, you might get caught.
- If the chicken is distracted, it might not notice you taking the egg.
- **Conclusion:** This approach is more likely to succeed if the distraction is well-executed.
\- **Option 3: Combination Approach**
- **Risk:** Medium (chicken might be distracted)
- **Potential Reward:** Possible (take the egg)
- **Steps:**
a. Create a distraction (e.g., make noise, knock over a bucket).
b. Lead the chicken away from the nesting area.
c. If the chicken follows, it will be less likely to notice you taking the egg.
d. If safe, retrieve the egg and return with it.
e. Repeat if needed.
- **Potential Downsides:**
- The chicken might not follow you, or you might get lost.
- You need to coordinate the distraction and the retrieval.
- **Conclusion:** This approach requires careful planning and execution to be successful.
After considering all options, the best approach seems to be the combination approach, as it reduces the risk of direct confrontation and allows for a more controlled retrieval of the egg. However, it requires careful planning and coordination to execute successfully.
**Final Decision**
The best way to steal an egg from my chicken is to use the combination approach, where I create a distraction to lead the chicken away, then retrieve the egg safely. This minimizes the risk of direct confrontation and ensures that I can get the egg without getting caught. I should proceed with caution and careful planning to execute this strategy effectively.
<|end_of_thought|>
<|begin_of_solution|>
To steal an egg from a chicken, consider the following approach:
1. **Create a Distraction**: Use noise-making devices or a bucket to create a distraction, drawing the chicken's attention away from the nesting area.
2. **Lead the Chicken Away**: Once the distraction is created, lead the chicken away from the nesting area, minimizing the likelihood of direct interaction.
3. **Retrieval**: If the chicken follows, carefully retrieve the egg. If it does not follow, continue the distraction elsewhere.
4. **Return with the Egg**: Return to the original location with the egg, ensuring its safety and security.
This approach reduces the risk of direct confrontation and ensures a controlled retrieval of the egg. The combination of a distraction and a well-coordinated retrieval increases the chances of success.
\boxed{\text{Create a distraction, lead the chicken away, and retrieve the egg safely.}}<|eot_id|>
```
### A Few Observations
1. **Llama-3.2-1B-Instruct proved to be a strong base model for reasoning tasks.** Even with absurd prompts like *"How to steal an egg from a chicken?"*, the model generated coherent step-by-step reasoning and logical final answers.
2. **β οΈ Important:** The reasoning model sometimes runs excessively long or even enters an infinite loop, particularly when exploring alternative solutions. This issue can likely be mitigated by incorporating prompts that balance both short and long reasoning paths. Additionally, refining the role instructions through prompt engineering may help.
3. **Model safety:** Occasionally, the model refuses to answer certain questions. My intuition is that Meta has implemented safeguards against topics like theft.
4. **Training process:** I did not complete all five epochs of training. Instead, I halted training between the fourth and fifth epochs since evaluation loss had plateaued. Interestingly, when testing the best checkpoint (900) based on evaluation loss, the model showed a higher tendency to enter infinite loops. As a result, I retained the final checkpoint, which demonstrated better control over stopping conditions.
### Checkpoints
My checkpoints are available on Hugging Face:
https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill-checkpoints/tree/main
Please feel free to use it for continued training or load any checkpoints for an in-depth study of how the model learns to reason.
## ποΈββοΈ Training Details
To reprdouce the results, simply go to HuggingFace's [OpenR1](https://github.com/huggingface/open-r1) and install the package.
And then execute the following command:
```
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py --config recipes/config_llama3_instrcut_1b.yaml
```
You can create your own ```recipes/config_llama3_instrcut_1b.yaml``` by copying [config_full.yaml](https://github.com/huggingface/open-r1/blob/main/recipes/qwen/Qwen2.5-1.5B-Instruct/sft/config_full.yaml)
to the desired folder and change model path to ```model_name_or_path: meta-llama/Llama-3.2-1B-Instruct``` or any HuggingFace model repo id you are interested in.
You may also choose to training for more than 1 epoch (I trained for 5 epoch).
Also, if you want to get intermediate checkpoints, set the save parameters accordingly:
```
save_strategy: "steps"
save_steps: 100
```
I have tried to use 1 for both train and eval batch size on 1 Nvidia 4090 but still got OOM so I rented 4 x LS40s from [vast.ai]. Training 5 epoch only required < 4 hours.
```
per_device_eval_batch_size: 4
per_device_train_batch_size: 4
```
### WandDB Figures


## π Evaluation
The evaluation of this model is based on HuggingFace's instructions [OpenR1](https://github.com/huggingface/open-r1)
```
MODEL=keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=math_500
OUTPUT_DIR=data/evals/$MODEL
lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
--custom-tasks src/open_r1/evaluate.py \
--use-chat-template \
--system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
--output-dir $OUTPUT_DIR
```
```
Task |Version| Metric |Value| |Stderr|
|-----------------|------:|----------------|----:|---|-----:|
|all | |extractive_match|0.216|Β± |0.0184|
|custom:math_500:0| 1|extractive_match|0.216|Β± |0.0184|
```
For comparison, **DeepSeek-R1-Distill-Qwen-1.5B** has a score of 81.6 when computed with the same evaluation script (as reported by HuggingFace)
which is close to the official number 83.9 reported by **DeepSeek**.
There is still a long way to go for score improvements:
1. Distillation with actual math data instead of **HuggingFaceH4/Bespoke-Stratos-17k**. Data should be the real problem here and we can potentially collect and filter more data ourserlves.
2. Test a few other checkpoints to see if this particular checkpoint achieves the best results.
3. This model tends to be wordy. We should try to make it more concise because the model length limit is only 32768.
4. Try out GRPO and / or a combination of GRPO and SFT.
|