NeMo
English
nvidia
steerlm
llama3
reward model
zhilinw commited on
Commit
e686a25
·
verified ·
1 Parent(s): f736c15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -42,7 +42,7 @@ SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Altern
42
 
43
  Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
44
 
45
- ## RewardBench LeaderBoard
46
 
47
 
48
  | Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |
 
42
 
43
  Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
44
 
45
+ ## RewardBench Primary Dataset LeaderBoard
46
 
47
 
48
  | Model | Type of Model| Overall | Chat | Chat-Hard | Safety | Reasoning |