qihoo360
/

Light-R1-14B-DS

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zhs12 commited on 11 days ago

Commit

314ad43

·

verified ·

1 Parent(s): d6ee6e3

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ base_model:
   <img width="80%" src="14b-rl.png">
 </p>
-[technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf)
 [GitHub page](https://github.com/Qihoo360/Light-R1)
@@ -32,7 +32,7 @@ We have finally seen expected behavior during RL training: simultaneous increase
 Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
 Light-R1-14B-DS also performed well on GPQA *without* any specific training.
-We are excited to release this model along with the [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf), and will continue to perfect our long-COT RL Post-Training.
 ## Usage
 Same as DeepSeek-R1-Distill-Qwen-14B.

   <img width="80%" src="14b-rl.png">
 </p>
+[technical report](https://arxiv.org/abs/2503.10460)
 [GitHub page](https://github.com/Qihoo360/Light-R1)
 Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
 Light-R1-14B-DS also performed well on GPQA *without* any specific training.
+We are excited to release this model along with the [technical report](https://arxiv.org/abs/2503.10460), and will continue to perfect our long-COT RL Post-Training.
 ## Usage
 Same as DeepSeek-R1-Distill-Qwen-14B.