zhs12 commited on
Commit
314ad43
·
verified ·
1 Parent(s): d6ee6e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ base_model:
16
  <img width="80%" src="14b-rl.png">
17
  </p>
18
 
19
- [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf)
20
 
21
  [GitHub page](https://github.com/Qihoo360/Light-R1)
22
 
@@ -32,7 +32,7 @@ We have finally seen expected behavior during RL training: simultaneous increase
32
 
33
  Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
34
  Light-R1-14B-DS also performed well on GPQA *without* any specific training.
35
- We are excited to release this model along with the [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf), and will continue to perfect our long-COT RL Post-Training.
36
 
37
  ## Usage
38
  Same as DeepSeek-R1-Distill-Qwen-14B.
 
16
  <img width="80%" src="14b-rl.png">
17
  </p>
18
 
19
+ [technical report](https://arxiv.org/abs/2503.10460)
20
 
21
  [GitHub page](https://github.com/Qihoo360/Light-R1)
22
 
 
32
 
33
  Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
34
  Light-R1-14B-DS also performed well on GPQA *without* any specific training.
35
+ We are excited to release this model along with the [technical report](https://arxiv.org/abs/2503.10460), and will continue to perfect our long-COT RL Post-Training.
36
 
37
  ## Usage
38
  Same as DeepSeek-R1-Distill-Qwen-14B.