Update README.md
Browse files
README.md
CHANGED
@@ -17,8 +17,8 @@ datasets:
|
|
17 |
# Storm-7B
|
18 |
- **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Jiaheng Liu](https://liujiaheng.github.io/) \\(^{2}\\), [Xingyuan Bu](https://scholar.google.com.hk/citations?user=cqYaRhUAAAAJ&hl=zh-CN) \\(^{2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{\dag 2}\\), [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
|
19 |
- \\(^{1}\\)MMLab, The Chinese University of Hong Kong   \\(^{2}\\)Shanghai AI Laboratory
|
20 |
-
- Paper: [Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level]()
|
21 |
-
- Finetuned from model: [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106)
|
22 |
- Dataset: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
|
23 |
- Reward Model: [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B)
|
24 |
|
|
|
17 |
# Storm-7B
|
18 |
- **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Jiaheng Liu](https://liujiaheng.github.io/) \\(^{2}\\), [Xingyuan Bu](https://scholar.google.com.hk/citations?user=cqYaRhUAAAAJ&hl=zh-CN) \\(^{2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{\dag 2}\\), [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
|
19 |
- \\(^{1}\\)MMLab, The Chinese University of Hong Kong   \\(^{2}\\)Shanghai AI Laboratory
|
20 |
+
- Paper: [Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level](https://arxiv.org/pdf/2406.11817)
|
21 |
+
- Finetuned from the model: [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106)
|
22 |
- Dataset: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
|
23 |
- Reward Model: [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B)
|
24 |
|