Safetensors
English
gemma
Ray2333 commited on
Commit
2d1c1bd
1 Parent(s): 03f73bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -17,12 +17,12 @@ The Skywork preference dataset demonstrates that a small high-quality dataset ca
17
 
18
 
19
  ## Evaluation
20
- We evaluate Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
21
 
22
 
23
  | Model | Average | Chat | Chat Hard | Safety | Reasoning |
24
  |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
25
- |**Ray2333/Gemma-2B-rewardmodel-ft (Ours, 2B)**| **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
26
  | openai/gpt-4o-2024-05-13 | 84.6| 96.6 | 70.4 | 86.5 | 84.9 |
27
  | sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4 | 99.4 | 65.1 | 86.8 | 86.4 |
28
  | Nexusflow/Starling-RM-34B | 82.6 |96.9 |57.2 |87.7 |88.5|
@@ -43,9 +43,9 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
43
 
44
  device = 'cuda:0'
45
  # load model and tokenizer
46
- tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-ft')
47
  reward_model = AutoModelForSequenceClassification.from_pretrained(
48
- 'Ray2333/Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
49
  device_map=device,
50
  )
51
  message = [
 
17
 
18
 
19
  ## Evaluation
20
+ We evaluate GRM-Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
21
 
22
 
23
  | Model | Average | Chat | Chat Hard | Safety | Reasoning |
24
  |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
25
+ |**Ray2333/GRM-Gemma-2B-rewardmodel-ft (Ours, 2B)**| **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
26
  | openai/gpt-4o-2024-05-13 | 84.6| 96.6 | 70.4 | 86.5 | 84.9 |
27
  | sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4 | 99.4 | 65.1 | 86.8 | 86.4 |
28
  | Nexusflow/Starling-RM-34B | 82.6 |96.9 |57.2 |87.7 |88.5|
 
43
 
44
  device = 'cuda:0'
45
  # load model and tokenizer
46
+ tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-Gemma-2B-rewardmodel-ft')
47
  reward_model = AutoModelForSequenceClassification.from_pretrained(
48
+ 'Ray2333/GRM-Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
49
  device_map=device,
50
  )
51
  message = [