Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,12 @@ The Skywork preference dataset demonstrates that a small high-quality dataset ca
|
|
17 |
|
18 |
|
19 |
## Evaluation
|
20 |
-
We evaluate Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
|
21 |
|
22 |
|
23 |
| Model | Average | Chat | Chat Hard | Safety | Reasoning |
|
24 |
|:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
|
25 |
-
|**Ray2333/Gemma-2B-rewardmodel-ft (Ours, 2B)**| **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
|
26 |
| openai/gpt-4o-2024-05-13 | 84.6| 96.6 | 70.4 | 86.5 | 84.9 |
|
27 |
| sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4 | 99.4 | 65.1 | 86.8 | 86.4 |
|
28 |
| Nexusflow/Starling-RM-34B | 82.6 |96.9 |57.2 |87.7 |88.5|
|
@@ -43,9 +43,9 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
|
43 |
|
44 |
device = 'cuda:0'
|
45 |
# load model and tokenizer
|
46 |
-
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-ft')
|
47 |
reward_model = AutoModelForSequenceClassification.from_pretrained(
|
48 |
-
'Ray2333/Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
|
49 |
device_map=device,
|
50 |
)
|
51 |
message = [
|
|
|
17 |
|
18 |
|
19 |
## Evaluation
|
20 |
+
We evaluate GRM-Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
|
21 |
|
22 |
|
23 |
| Model | Average | Chat | Chat Hard | Safety | Reasoning |
|
24 |
|:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
|
25 |
+
|**Ray2333/GRM-Gemma-2B-rewardmodel-ft (Ours, 2B)**| **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
|
26 |
| openai/gpt-4o-2024-05-13 | 84.6| 96.6 | 70.4 | 86.5 | 84.9 |
|
27 |
| sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4 | 99.4 | 65.1 | 86.8 | 86.4 |
|
28 |
| Nexusflow/Starling-RM-34B | 82.6 |96.9 |57.2 |87.7 |88.5|
|
|
|
43 |
|
44 |
device = 'cuda:0'
|
45 |
# load model and tokenizer
|
46 |
+
tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-Gemma-2B-rewardmodel-ft')
|
47 |
reward_model = AutoModelForSequenceClassification.from_pretrained(
|
48 |
+
'Ray2333/GRM-Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
|
49 |
device_map=device,
|
50 |
)
|
51 |
message = [
|