chenwuml commited on
Commit
37b77ff
·
1 Parent(s): 7846dca

initial commit

Browse files
Files changed (1) hide show
  1. README.md +0 -11
README.md CHANGED
@@ -253,14 +253,3 @@ CodeFu is developed by the **AWS WWSO Prototyping** Team. If you find CodeFu hel
253
  }
254
  ```
255
 
256
- ## References
257
- [1] - Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
258
-
259
- [2] - Yu, Q., Zhang, Z., Zhu, R., Yuan, Y., Zuo, X., Yue, Y., ... & Wang, M. (2025). DAPO: An open-source llm reinforcement learning system at scale. arXiv preprint arXiv:2503.14476.
260
-
261
- [3] - Hao, Y., Dong, L., Wu, X., Huang, S., Chi, Z., & Wei, F. (2025). On-Policy RL with Optimal Reward Baseline. arXiv preprint arXiv:2505.23585.
262
-
263
- [4] - Liu, Z., Chen, C., Li, W., Qi, P., Pang, T., Du, C., ... & Lin, M. Understanding r1-zero-like training: A critical perspective. arXiv preprint arXiv:2503.20783.
264
-
265
- [5] - Zheng, C., Liu, S., Li, M., Chen, X. H., Yu, B., Gao, C., ... & Lin, J. (2025). Group Sequence Policy Optimization. arXiv preprint arXiv:2507.18071.
266
-
 
253
  }
254
  ```
255