Update README.md
Browse files
README.md
CHANGED
@@ -40,14 +40,13 @@ HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](h
|
|
40 |
|
41 |
SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](https://arxiv.org/abs/2310.05344)
|
42 |
|
43 |
-
Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training
|
44 |
|
45 |
## RewardBench Primary Dataset LeaderBoard
|
46 |
|
47 |
|
48 |
| Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |
|
49 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
50 |
-
| Nemotron-4-340B-SteerLM-RM | Proprietary LLM| **91.6** | 95.5 |**86.4** | 90.8 | 93.6 |
|
51 |
| ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
|
52 |
| Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | **92.7** | 97.7 |
|
53 |
| _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.2 | 91.9 | 79.8 | 92.2 | 89.0 |
|
|
|
40 |
|
41 |
SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](https://arxiv.org/abs/2310.05344)
|
42 |
|
43 |
+
Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training with data and model parallelism for all components of alignment. All of our checkpoints are compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
|
44 |
|
45 |
## RewardBench Primary Dataset LeaderBoard
|
46 |
|
47 |
|
48 |
| Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |
|
49 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
|
|
50 |
| ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
|
51 |
| Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | **92.7** | 97.7 |
|
52 |
| _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.2 | 91.9 | 79.8 | 92.2 | 89.0 |
|