Ray2333
/

GRM-Gemma-2B-rewardmodel-ft

Model card Files Files and versions Community

Ray2333 commited on Sep 13

Commit

2d1c1bd

•

1 Parent(s): 03f73bd

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -17,12 +17,12 @@ The Skywork preference dataset demonstrates that a small high-quality dataset ca
 ## Evaluation
-We evaluate Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
 |       Model               | Average       |  Chat     |     Chat Hard      |     Safety      |     Reasoning     |
 |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
-|**Ray2333/Gemma-2B-rewardmodel-ft (Ours, 2B)**|  **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
 | openai/gpt-4o-2024-05-13 | 84.6|	96.6	| 70.4	| 86.5	| 84.9 |
 | sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4	| 99.4 |	65.1 |	86.8	| 86.4 |
 | Nexusflow/Starling-RM-34B	|	82.6	|96.9	|57.2	|87.7	|88.5|
@@ -43,9 +43,9 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
 device = 'cuda:0'
 # load model and tokenizer
-tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-ft')
 reward_model = AutoModelForSequenceClassification.from_pretrained(
-                'Ray2333/Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
                 device_map=device,
                 )
 message = [

 ## Evaluation
+We evaluate GRM-Gemma-2B-rewardmodel-ft on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieved SOTA performance among models smaller than 6B.
 |       Model               | Average       |  Chat     |     Chat Hard      |     Safety      |     Reasoning     |
 |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
+|**Ray2333/GRM-Gemma-2B-rewardmodel-ft (Ours, 2B)**|  **84.7** | 89.4 | 75.2 | 85.5 | 88.8 |
 | openai/gpt-4o-2024-05-13 | 84.6|	96.6	| 70.4	| 86.5	| 84.9 |
 | sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) | 84.4	| 99.4 |	65.1 |	86.8	| 86.4 |
 | Nexusflow/Starling-RM-34B	|	82.6	|96.9	|57.2	|87.7	|88.5|
 device = 'cuda:0'
 # load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-Gemma-2B-rewardmodel-ft')
 reward_model = AutoModelForSequenceClassification.from_pretrained(
+                'Ray2333/GRM-Gemma-2B-rewardmodel-ft', torch_dtype=torch.float16,
                 device_map=device,
                 )
 message = [