NeMo
English
nvidia
steerlm
llama3
reward model
zhilinw commited on
Commit
9935fe0
1 Parent(s): 7c50c23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -21
README.md CHANGED
@@ -36,9 +36,7 @@ Nonetheless, if you are only interested in using it as a conventional reward mod
36
  Llama3-70B-SteerLM-RM is trained from [Llama 3 70B Base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) with the [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) dataset
37
 
38
 
39
- HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](http://arxiv.org/abs/2311.09528)
40
-
41
- SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](https://arxiv.org/abs/2310.05344)
42
 
43
  Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training with data and model parallelism for all components of alignment. All of our checkpoints are compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
44
 
@@ -47,7 +45,7 @@ Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/N
47
 
48
  | Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |
49
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
50
- | _**Nemotron-4-340B-RM**_ | Trained with Permissive Licensed Data | **92.0** | 95.8 | **87.1** | 91.5 | 93.7 |
51
  | ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
52
  | Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | 92.7 | 97.7 |
53
  | _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.8 | 91.3 | 80.3 | **92.8** | 90.7 |
@@ -75,7 +73,9 @@ You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner
75
 
76
  1. Spin up an inference server within the [NeMo Aligner container](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile)
77
 
 
78
  ```python
 
79
  python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
80
  rm_model_file=Llama3-70B-SteerLM-RM \
81
  trainer.num_nodes=1 \
@@ -125,23 +125,12 @@ E-Mail: [Zhilin Wang](mailto:[email protected])
125
  If you find this dataset useful, please cite the following works
126
 
127
  ```bibtex
128
- @misc{wang2023helpsteer,
129
- title={HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM},
130
- author={Zhilin Wang and Yi Dong and Jiaqi Zeng and Virginia Adams and Makesh Narsimhan Sreedhar and Daniel Egert and Olivier Delalleau and Jane Polak Scowcroft and Neel Kant and Aidan Swope and Oleksii Kuchaiev},
131
- year={2023},
132
- eprint={2311.09528},
133
- archivePrefix={arXiv},
134
- primaryClass={cs.CL}
135
- }
136
- ```
137
-
138
- ```bibtex
139
- @misc{dong2023steerlm,
140
- title={SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF},
141
- author={Yi Dong and Zhilin Wang and Makesh Narsimhan Sreedhar and Xianchao Wu and Oleksii Kuchaiev},
142
- year={2023},
143
- eprint={2310.05344},
144
  archivePrefix={arXiv},
145
- primaryClass={cs.CL}
146
  }
147
  ```
 
36
  Llama3-70B-SteerLM-RM is trained from [Llama 3 70B Base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) with the [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) dataset
37
 
38
 
39
+ HelpSteer2 Paper : [HelpSteer2: Open-source dataset for training top-performing reward models](http://arxiv.org/abs/2406.08673)
 
 
40
 
41
  Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training with data and model parallelism for all components of alignment. All of our checkpoints are compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
42
 
 
45
 
46
  | Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |
47
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
48
+ | _**Nemotron-4-340B-Reward**_ | Trained with Permissive Licensed Data | **92.0** | 95.8 | **87.1** | 91.5 | 93.7 |
49
  | ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
50
  | Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | 92.7 | 97.7 |
51
  | _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.8 | 91.3 | 80.3 | **92.8** | 90.7 |
 
73
 
74
  1. Spin up an inference server within the [NeMo Aligner container](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile)
75
 
76
+
77
  ```python
78
+ HF_HOME=<YOUR_HF_HOME_CONTAINING_TOKEN_WITH_LLAMA3_70B_ACCESS>
79
  python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
80
  rm_model_file=Llama3-70B-SteerLM-RM \
81
  trainer.num_nodes=1 \
 
125
  If you find this dataset useful, please cite the following works
126
 
127
  ```bibtex
128
+ @misc{wang2024helpsteer2,
129
+ title={HelpSteer2: Open-source dataset for training top-performing reward models},
130
+ author={Zhilin Wang and Yi Dong and Olivier Delalleau and Jiaqi Zeng and Gerald Shen and Daniel Egert and Jimmy J. Zhang and Makesh Narsimhan Sreedhar and Oleksii Kuchaiev},
131
+ year={2024},
132
+ eprint={2406.08673},
 
 
 
 
 
 
 
 
 
 
 
133
  archivePrefix={arXiv},
134
+ primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
135
  }
136
  ```