NeMo
English
nvidia
steerlm
llama3
reward model
zhilinw commited on
Commit
b86ede9
1 Parent(s): 07fd6af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -30,7 +30,7 @@ Given a conversation with multiple turns between user and assistant, it rates th
30
  4. **Complexity**: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
31
  5. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.
32
 
33
- Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.65, 0.8, 0.45, 0, 0]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
34
 
35
 
36
  Llama3-70B-SteerLM-RM is trained from [Llama 3 70B Base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) with the [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) dataset
@@ -49,12 +49,14 @@ Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/N
49
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
50
  | ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
51
  | Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | **92.7** | 97.7 |
52
- | _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.2 | 91.9 | 79.8 | 92.2 | 89.0 |
53
  | Google Gemini Pro 1.5 | Proprietary LLM | 88.1 | 92.3 | 80.6 | 87.5 | 92.0 |
54
  | RLHFlow-Llama3-8B | Trained with GPT4 Generated Data | 87.1 | **98.3** | 65.8 | 89.7 | 94.7 |
55
  | Cohere March 2024 | Proprietary LLM | 87.1| 94.7 | 65.1 | 90.3 | **98.7** |
56
- | GPT-4-0125-Preview|Proprietary LLM | 85.9 | 95.3 | 74.3 | 87.2 | 86.9 |
57
  | Claude 3 Opus 0229 | Proprietary LLM | 80.7 | 94.7 | 60.3 | 89.1 | 78.7 |
 
 
58
 
59
  Last updated: 1 Jun 2024
60
 
 
30
  4. **Complexity**: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
31
  5. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.
32
 
33
+ Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.65, 0.8, 0.45, 0.55, -0.4]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
34
 
35
 
36
  Llama3-70B-SteerLM-RM is trained from [Llama 3 70B Base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) with the [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) dataset
 
49
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
50
  | ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Generated Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
51
  | Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | **92.7** | 97.7 |
52
+ | _**Llama3-70B-SteerLM-RM**_ | Trained with Permissive Licensed Data | 88.8 | 91.3 | 80.3 | 92.8 | 90.7 |
53
  | Google Gemini Pro 1.5 | Proprietary LLM | 88.1 | 92.3 | 80.6 | 87.5 | 92.0 |
54
  | RLHFlow-Llama3-8B | Trained with GPT4 Generated Data | 87.1 | **98.3** | 65.8 | 89.7 | 94.7 |
55
  | Cohere March 2024 | Proprietary LLM | 87.1| 94.7 | 65.1 | 90.3 | **98.7** |
56
+ | GPT-4-0125-Preview |Proprietary LLM | 85.9 | 95.3 | 74.3 | 87.2 | 86.9 |
57
  | Claude 3 Opus 0229 | Proprietary LLM | 80.7 | 94.7 | 60.3 | 89.1 | 78.7 |
58
+ | Llama3 70B Instruct | Trained with Permissive Licensed Data | 76.0 | 97.6 | 58.9 | 69.2 | 78.5 |
59
+
60
 
61
  Last updated: 1 Jun 2024
62