Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ GRPO is applied after a distilled R1 model is created to further refine its reas
|
|
22 |
*Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
|
23 |
[https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
|
24 |
|
25 |
-
- Converted to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs
|
26 |
|
27 |
# Notes:
|
28 |
- Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
|
|
|
22 |
*Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
|
23 |
[https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
|
24 |
|
25 |
+
- Converted to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs.
|
26 |
|
27 |
# Notes:
|
28 |
- Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
|