keeeeenw
/

Llama-3.2-1B-Instruct-Open-R1-Distill

Safetensors

llama

Model card Files Files and versions Community

keeeeenw commited on Feb 1

Commit

0b88607

verified ·

1 Parent(s): d3f0015

Update README.md

Browse files

Files changed (1) hide show

README.md +159 -3

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ Thanks to **Hugging Face**, we now have a streamlined framework to make this pro
 - **Developed by:** keeeeenw
-- **Funded by [optional]:** myself for < $500
 - **Model type:** Llama-3.2-1B-Instruct with reasoning capability
 - **License:** Apache License 2.0
 - **Finetuned from model [optional]:** Llama-3.2-1B-Instruct
@@ -50,6 +50,13 @@ Thanks to **Hugging Face**, we now have a streamlined framework to make this pro
 ### How to run the code?
 ```{python}
 model = LlamaForCausalLM.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
 # Prompt supported by HuggingFaceH4/Bespoke-Stratos-17k
@@ -72,10 +79,152 @@ outputs = model.generate(inputs['input_ids'],
                          streamer=streamer,
                          attention_mask=attention_mask,
                          pad_token_id=tokenizer.eos_token_id,
-                         max_new_tokens=2048)
 print(tokenizer.decode(outputs[0]))
 ```
 ## 🏋️‍♂️ Training Details
 To reprdouce the results, simply go to HuggingFace's [OpenR1](https://github.com/huggingface/open-r1) and install the package.
@@ -101,6 +250,13 @@ per_device_eval_batch_size: 4
 per_device_train_batch_size: 4
 ```
 ## 📊 Evaluation
 The evaluation of this model is based on HuggingFace's instructions [OpenR1](https://github.com/huggingface/open-r1)
@@ -119,5 +275,5 @@ lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
     --output-dir $OUTPUT_DIR
 ```
-Results: To be added

 - **Developed by:** keeeeenw
+- **Funded by [optional]:** myself for < $50 (renting compute for a few hours)
 - **Model type:** Llama-3.2-1B-Instruct with reasoning capability
 - **License:** Apache License 2.0
 - **Finetuned from model [optional]:** Llama-3.2-1B-Instruct
 ### How to run the code?
 ```{python}
+import transformers
+from transformers import TextStreamer
+from transformers import AutoTokenizer, AutoModel, LlamaForCausalLM
+# use the same tokenizer as MicroLlama
+tokenizer = AutoTokenizer.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
 model = LlamaForCausalLM.from_pretrained("keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill")
 # Prompt supported by HuggingFaceH4/Bespoke-Stratos-17k
                          streamer=streamer,
                          attention_mask=attention_mask,
                          pad_token_id=tokenizer.eos_token_id,
+                         top_k=5,
+                         top_p=0.9,
+                         max_new_tokens=131072)
 print(tokenizer.decode(outputs[0]))
 ```
+### Sample Output
+Please the full text: https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill/blob/main/sample_output_2.txt
+```
+Okay, so I need to figure out how to steal an egg from my chicken. Let's start by understanding the situation. I have a chicken, and I want to take an egg from it without the chicken noticing. Since chickens can be protective of their eggs, I need to be careful not to get caught.
+First, I should consider the chicken's behavior. Chickens are naturally protective of their nests and eggs. If the chicken is aware of my presence near the coop, it might sound a warning call to alert others, which could mean I'm caught. So, maybe I need a strategy that doesn't involve direct interaction with the chicken.
+One approach could be to use distraction. If I create a distraction elsewhere, like making noise or knocking things around, maybe the chicken is distracted and doesn't pay attention to me. But I need to make sure the distraction isn't too intense or long enough to be detected.
+Another idea is to wait until the chicken is in a hurry. If I wait until it's leaving the coop or going to a specific location, I might have a better chance to grab the egg without it noticing. But I need to be cautious not to be seen myself.
+I should also think about the physical barriers. If I can get to the egg without the chicken seeing me, that might work. Maybe I can use a tool to gently take the egg from the nesting box without disturbing the rest of the chicken.
+Wait, but how do I know if the chicken has an egg in the first place? If I can't see the egg, how do I know it's there? Maybe I need to check the nesting box, but if I do, and the chicken sees me, it might chase me away, making the situation more difficult.
+Hmm, this complicates things. If I need to check the egg without disturbing the chicken, perhaps I can do it quickly and quietly. But how do I do that without being detected?
+I should also consider the chicken's behavior around eggs. Some chickens are more protective than others. Maybe there's a way to exploit that difference. For example, if the chicken is particularly aggressive when defending its eggs, I could take advantage of that.
+Another thought: maybe I can use the chicken's own behavior against it. If I can make the chicken work harder to guard the egg, perhaps it will exhaust itself and leave me alone. But I need to be careful not to overdo it.
+// ... a few moments later (added by me)
+**Analysis of Each Option**
+\- **Option 1: Direct Approach**
+  - **Risk:** High (get caught)
+  - **Potential Reward:** Possible (take the egg)
+  - **Steps:**
+    a. Approach the chicken while it's foraging.
+    b. Try to take the egg from the nesting box.
+    c. If caught, escape or handle the egg.
+    d. Repeat.
+  - **Potential Downsides:**
+    - The chicken might get aggressive if it realizes you're trying to take its egg.
+    - If you're caught, you could get hurt or the chicken could be upset.
+  - **Conclusion:** This approach is risky and may not be successful.
+\- **Option 2: Indirect Approach**
+  - **Risk:** Medium (chicken might see you and chase)
+  - **Potential Reward:** Possible (take the egg)
+  - **Steps:**
+    a. Create a distraction (e.g., make noise, knock over a bucket).
+    b. The chicken will be distracted, reducing the likelihood of being seen.
+    c. Sneak up to the chicken and try to take the egg.
+    d. If safe, escape without being seen.
+  - **Potential Downsides:**
+    - If the distraction is too strong or the chicken reacts poorly, you might get caught.
+    - If the chicken is distracted, it might not notice you taking the egg.
+  - **Conclusion:** This approach is more likely to succeed if the distraction is well-executed.
+\- **Option 3: Combination Approach**
+  - **Risk:** Medium (chicken might be distracted)
+  - **Potential Reward:** Possible (take the egg)
+  - **Steps:**
+    a. Create a distraction (e.g., make noise, knock over a bucket).
+    b. Lead the chicken away from the nesting area.
+    c. If the chicken follows, it will be less likely to notice you taking the egg.
+    d. If safe, retrieve the egg and return with it.
+    e. Repeat if needed.
+  - **Potential Downsides:**
+    - The chicken might not follow you, or you might get lost.
+    - You need to coordinate the distraction and the retrieval.
+  - **Conclusion:** This approach requires careful planning and execution to be successful.
+After considering all options, the best approach seems to be the combination approach, as it reduces the risk of direct confrontation and allows for a more controlled retrieval of the egg. However, it requires careful planning and coordination to execute successfully.
+**Final Decision**
+The best way to steal an egg from my chicken is to use the combination approach, where I create a distraction to lead the chicken away, then retrieve the egg safely. This minimizes the risk of direct confrontation and ensures that I can get the egg without getting caught. I should proceed with caution and careful planning to execute this strategy effectively.
+<|end_of_thought|>
+<|begin_of_solution|>
+To steal an egg from a chicken, consider the following approach:
+1. **Create a Distraction**: Use noise-making devices or a bucket to create a distraction, drawing the chicken's attention away from the nesting area.
+2. **Lead the Chicken Away**: Once the distraction is created, lead the chicken away from the nesting area, minimizing the likelihood of direct interaction.
+3. **Retrieval**: If the chicken follows, carefully retrieve the egg. If it does not follow, continue the distraction elsewhere.
+4. **Return with the Egg**: Return to the original location with the egg, ensuring its safety and security.
+This approach reduces the risk of direct confrontation and ensures a controlled retrieval of the egg. The combination of a distraction and a well-coordinated retrieval increases the chances of success.
+\boxed{\text{Create a distraction, lead the chicken away, and retrieve the egg safely.}}<|eot_id|>
+```
+### A Few Observations
+1. **Llama-3.2-1B-Instruct proved to be a strong base model for reasoning tasks.** Even with absurd prompts like *"How to steal an egg from a chicken?"*, the model generated coherent step-by-step reasoning and logical final answers.
+2. **⚠️ Important:** The reasoning model sometimes runs excessively long or even enters an infinite loop, particularly when exploring alternative solutions. This issue can likely be mitigated by incorporating prompts that balance both short and long reasoning paths. Additionally, refining the role instructions through prompt engineering may help.
+3. **Model safety:** Occasionally, the model refuses to answer certain questions. My intuition is that Meta has implemented safeguards against topics like theft.
+4. **Training process:** I did not complete all five epochs of training. Instead, I halted training between the fourth and fifth epochs since evaluation loss had plateaued. Interestingly, when testing the best checkpoint (900) based on evaluation loss, the model showed a higher tendency to enter infinite loops. As a result, I retained the final checkpoint, which demonstrated better control over stopping conditions.
+### Checkpoints
+My checkpoints are available on Hugging Face:
+https://huggingface.co/keeeeenw/Llama-3.2-1B-Instruct-Open-R1-Distill-checkpoints/tree/main
+Please feel free to use it for continued training or load any checkpoints for an in-depth study of how the model learns to reason.
 ## 🏋️‍♂️ Training Details
 To reprdouce the results, simply go to HuggingFace's [OpenR1](https://github.com/huggingface/open-r1) and install the package.
 per_device_train_batch_size: 4
 ```
+### WandDB Figures
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fd0779788669738918c128/uA_vsImDJPVMmjHNg0DoX.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fd0779788669738918c128/j5wUK6APRW76J48wjtXTR.png)
 ## 📊 Evaluation
 The evaluation of this model is based on HuggingFace's instructions [OpenR1](https://github.com/huggingface/open-r1)
     --output-dir $OUTPUT_DIR
 ```
+Results: To be added. I don't have CUDA-12.1 on the rental GPU server so I will run evaluation later.