Haoxiang-Wang
commited on
Commit
•
86323c8
1
Parent(s):
f6bdb40
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: llama3
|
|
11 |
[Haoxiang Wang*](https://haoxiang-wang.github.io/), [Wei Xiong*](https://weixiongust.github.io/WeiXiongUST/index.html), [Tengyang Xie](https://tengyangxie.github.io/), [Han Zhao](https://hanzhaoml.github.io/), [Tong Zhang](https://tongzhang-ml.org/)
|
12 |
|
13 |
+ **Blog**: https://rlhflow.github.io/posts/2024-05-29-multi-objective-reward-modeling/
|
14 |
-
+ **Tech Report**:
|
15 |
+ **Model**: [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)
|
16 |
+ Finetuned from model: [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
|
17 |
- **Code Repository:** https://github.com/RLHFlow/RLHF-Reward-Modeling/
|
@@ -101,10 +101,10 @@ print(helpsteer_rewards_pred)
|
|
101 |
|
102 |
If you find this work useful for your research, please consider citing:
|
103 |
```
|
104 |
-
@
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
}
|
109 |
|
110 |
@inproceedings{wang2024arithmetic,
|
|
|
11 |
[Haoxiang Wang*](https://haoxiang-wang.github.io/), [Wei Xiong*](https://weixiongust.github.io/WeiXiongUST/index.html), [Tengyang Xie](https://tengyangxie.github.io/), [Han Zhao](https://hanzhaoml.github.io/), [Tong Zhang](https://tongzhang-ml.org/)
|
12 |
|
13 |
+ **Blog**: https://rlhflow.github.io/posts/2024-05-29-multi-objective-reward-modeling/
|
14 |
+
+ **Tech Report**: https://arxiv.org/abs/2406.12845
|
15 |
+ **Model**: [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)
|
16 |
+ Finetuned from model: [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
|
17 |
- **Code Repository:** https://github.com/RLHFlow/RLHF-Reward-Modeling/
|
|
|
101 |
|
102 |
If you find this work useful for your research, please consider citing:
|
103 |
```
|
104 |
+
@article{ArmoRM,
|
105 |
+
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
|
106 |
+
author={Haoxiang Wang and Wei Xiong and Tengyang Xie and Han Zhao and Tong Zhang},
|
107 |
+
journal={arXiv preprint arXiv:2406.12845},
|
108 |
}
|
109 |
|
110 |
@inproceedings{wang2024arithmetic,
|