gemma-2-9b-evaluator-v1
This model is a fine-tuned version of google/gemma-2-9b-it on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.6849
- Helpfulness Accuracy: 0.4181
- Helpfulness Spearmanr: 0.4129
- Helpfulness Kendalltau: 0.3194
- Helpfulness Pearsonr: 0.5143
- Helpfulness Rmse: 1.0735
- Helpfulness Mae: 0.8559
- Correctness Accuracy: 0.4769
- Correctness Spearmanr: 0.4135
- Correctness Kendalltau: 0.3214
- Correctness Pearsonr: 0.5143
- Correctness Rmse: 1.0616
- Correctness Mae: 0.8366
- Coherence Accuracy: 0.7148
- Coherence Spearmanr: 0.2880
- Coherence Kendalltau: 0.2322
- Coherence Pearsonr: 0.3705
- Coherence Rmse: 0.6259
- Coherence Mae: 0.4798
- Complexity Accuracy: 0.6012
- Complexity Spearmanr: 0.5174
- Complexity Kendalltau: 0.4180
- Complexity Pearsonr: 0.5310
- Complexity Rmse: 0.6137
- Complexity Mae: 0.4678
- Verbosity Accuracy: 0.6763
- Verbosity Spearmanr: 0.6069
- Verbosity Kendalltau: 0.4930
- Verbosity Pearsonr: 0.6736
- Verbosity Rmse: 0.5983
- Verbosity Mae: 0.4485
- Avg Accuracy: 0.5775
- Avg Spearmanr: 0.4477
- Avg Kendalltau: 0.3568
- Avg Pearsonr: 0.5207
- Avg Rmse: 0.7946
- Avg Mae: 0.6177
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_min_lr
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Helpfulness Accuracy | Helpfulness Spearmanr | Helpfulness Kendalltau | Helpfulness Pearsonr | Helpfulness Rmse | Helpfulness Mae | Correctness Accuracy | Correctness Spearmanr | Correctness Kendalltau | Correctness Pearsonr | Correctness Rmse | Correctness Mae | Coherence Accuracy | Coherence Spearmanr | Coherence Kendalltau | Coherence Pearsonr | Coherence Rmse | Coherence Mae | Complexity Accuracy | Complexity Spearmanr | Complexity Kendalltau | Complexity Pearsonr | Complexity Rmse | Complexity Mae | Verbosity Accuracy | Verbosity Spearmanr | Verbosity Kendalltau | Verbosity Pearsonr | Verbosity Rmse | Verbosity Mae | Avg Accuracy | Avg Spearmanr | Avg Kendalltau | Avg Pearsonr | Avg Rmse | Avg Mae |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No log | 0 | 0 | 7.3447 | 0.0934 | -0.1708 | -0.1290 | -0.1865 | 2.6016 | 2.3248 | 0.2630 | -0.0067 | -0.0052 | -0.0083 | 1.3643 | 1.1593 | 0.0530 | -0.0417 | -0.0336 | -0.0564 | 3.0362 | 2.8994 | 0.0983 | -0.1112 | -0.0871 | -0.1231 | 1.0970 | 0.8836 | 0.1378 | -0.0086 | -0.0068 | -0.0139 | 1.2755 | 1.0756 | 0.1291 | -0.0678 | -0.0523 | -0.0776 | 1.8749 | 1.6686 |
2.1437 | 0.0719 | 500 | 2.3464 | 0.3092 | 0.0421 | 0.0320 | 0.0373 | 1.2641 | 1.0162 | 0.3198 | 0.1021 | 0.0767 | 0.1458 | 1.2286 | 1.0016 | 0.5491 | 0.0236 | 0.0188 | 0.0441 | 0.7506 | 0.6140 | 0.4711 | 0.1512 | 0.1182 | 0.1430 | 0.7263 | 0.5881 | 0.5588 | 0.1844 | 0.1419 | 0.2079 | 0.7855 | 0.5510 | 0.4416 | 0.1007 | 0.0775 | 0.1156 | 0.9510 | 0.7542 |
1.843 | 0.1437 | 1000 | 1.9347 | 0.3998 | 0.3252 | 0.2481 | 0.3956 | 1.1578 | 0.9084 | 0.4403 | 0.3335 | 0.2557 | 0.4149 | 1.1291 | 0.8900 | 0.6628 | 0.1423 | 0.1139 | 0.1817 | 0.6701 | 0.5221 | 0.5607 | 0.3808 | 0.3022 | 0.4127 | 0.6576 | 0.5173 | 0.6243 | 0.4424 | 0.3485 | 0.4902 | 0.7009 | 0.4784 | 0.5376 | 0.3248 | 0.2537 | 0.3790 | 0.8631 | 0.6632 |
1.8118 | 0.2156 | 1500 | 1.8294 | 0.3738 | 0.3565 | 0.2739 | 0.4501 | 1.1206 | 0.8954 | 0.4316 | 0.3606 | 0.2784 | 0.4580 | 1.0958 | 0.8865 | 0.6830 | 0.1907 | 0.1529 | 0.2532 | 0.6581 | 0.5247 | 0.5848 | 0.4564 | 0.3656 | 0.4838 | 0.6362 | 0.4909 | 0.6368 | 0.5178 | 0.4127 | 0.5913 | 0.6608 | 0.4474 | 0.5420 | 0.3764 | 0.2967 | 0.4473 | 0.8343 | 0.6490 |
1.6318 | 0.2875 | 2000 | 1.7885 | 0.3882 | 0.3734 | 0.2886 | 0.4616 | 1.1076 | 0.8827 | 0.4701 | 0.3800 | 0.2947 | 0.4699 | 1.0894 | 0.8631 | 0.6965 | 0.2344 | 0.1883 | 0.2992 | 0.6412 | 0.4804 | 0.5867 | 0.4633 | 0.3721 | 0.4818 | 0.6373 | 0.4734 | 0.6320 | 0.5412 | 0.4338 | 0.6172 | 0.6365 | 0.4765 | 0.5547 | 0.3985 | 0.3155 | 0.4659 | 0.8224 | 0.6352 |
1.7944 | 0.3594 | 2500 | 1.7645 | 0.4046 | 0.3903 | 0.3016 | 0.4828 | 1.0943 | 0.8715 | 0.4576 | 0.3930 | 0.3053 | 0.4839 | 1.0783 | 0.8590 | 0.6840 | 0.2530 | 0.2035 | 0.3156 | 0.6434 | 0.5073 | 0.5838 | 0.4831 | 0.3894 | 0.4970 | 0.6277 | 0.4767 | 0.6474 | 0.5542 | 0.4453 | 0.6269 | 0.6427 | 0.4815 | 0.5555 | 0.4147 | 0.3290 | 0.4812 | 0.8173 | 0.6392 |
1.6247 | 0.4312 | 3000 | 1.7417 | 0.3545 | 0.3994 | 0.3085 | 0.4981 | 1.0894 | 0.8818 | 0.4672 | 0.4033 | 0.3128 | 0.5024 | 1.0650 | 0.8560 | 0.7110 | 0.2636 | 0.2123 | 0.3346 | 0.6330 | 0.4812 | 0.5915 | 0.4970 | 0.4012 | 0.5094 | 0.6237 | 0.4733 | 0.6532 | 0.5731 | 0.4623 | 0.6450 | 0.6289 | 0.4780 | 0.5555 | 0.4273 | 0.3394 | 0.4979 | 0.8080 | 0.6341 |
1.5258 | 0.5031 | 3500 | 1.7147 | 0.3950 | 0.4007 | 0.3098 | 0.4954 | 1.0861 | 0.8657 | 0.4730 | 0.4060 | 0.3154 | 0.4983 | 1.0706 | 0.8456 | 0.7148 | 0.2700 | 0.2174 | 0.3492 | 0.6280 | 0.4644 | 0.6040 | 0.5005 | 0.4037 | 0.5149 | 0.6216 | 0.4902 | 0.6590 | 0.5813 | 0.4698 | 0.6510 | 0.6043 | 0.4418 | 0.5692 | 0.4317 | 0.3432 | 0.5018 | 0.8021 | 0.6215 |
1.6146 | 0.5750 | 4000 | 1.7121 | 0.4114 | 0.4005 | 0.3095 | 0.5034 | 1.0830 | 0.8597 | 0.4769 | 0.4057 | 0.3152 | 0.5055 | 1.0649 | 0.8371 | 0.7225 | 0.2730 | 0.2200 | 0.3536 | 0.6249 | 0.4447 | 0.5954 | 0.5037 | 0.4064 | 0.5112 | 0.6232 | 0.4644 | 0.6638 | 0.5981 | 0.4850 | 0.6605 | 0.6168 | 0.4499 | 0.5740 | 0.4362 | 0.3472 | 0.5068 | 0.8026 | 0.6112 |
1.517 | 0.6468 | 4500 | 1.7076 | 0.4171 | 0.4057 | 0.3134 | 0.5066 | 1.0792 | 0.8598 | 0.4807 | 0.4093 | 0.3179 | 0.5092 | 1.0628 | 0.8370 | 0.7216 | 0.2781 | 0.2240 | 0.3628 | 0.6219 | 0.4552 | 0.5915 | 0.5043 | 0.4070 | 0.5095 | 0.6316 | 0.4612 | 0.6667 | 0.6012 | 0.4878 | 0.6676 | 0.6101 | 0.4619 | 0.5755 | 0.4397 | 0.3500 | 0.5111 | 0.8011 | 0.6150 |
1.6201 | 0.7187 | 5000 | 1.6972 | 0.4143 | 0.4073 | 0.3146 | 0.5110 | 1.0756 | 0.8582 | 0.4836 | 0.4090 | 0.3176 | 0.5121 | 1.0610 | 0.8329 | 0.7197 | 0.2816 | 0.2271 | 0.3668 | 0.6214 | 0.4611 | 0.5944 | 0.5126 | 0.4142 | 0.5222 | 0.6223 | 0.4624 | 0.6647 | 0.6072 | 0.4931 | 0.6724 | 0.6066 | 0.4627 | 0.5753 | 0.4435 | 0.3533 | 0.5169 | 0.7974 | 0.6155 |
1.6377 | 0.7906 | 5500 | 1.6846 | 0.4220 | 0.4099 | 0.3168 | 0.5128 | 1.0754 | 0.8550 | 0.4798 | 0.4111 | 0.3196 | 0.5127 | 1.0591 | 0.8381 | 0.7245 | 0.2843 | 0.2292 | 0.3684 | 0.6206 | 0.4495 | 0.6021 | 0.5131 | 0.4145 | 0.5276 | 0.6144 | 0.4601 | 0.6696 | 0.6092 | 0.4951 | 0.6752 | 0.5966 | 0.4445 | 0.5796 | 0.4455 | 0.3550 | 0.5193 | 0.7932 | 0.6095 |
1.619 | 0.8624 | 6000 | 1.6815 | 0.4027 | 0.4127 | 0.3192 | 0.5166 | 1.0698 | 0.8549 | 0.4817 | 0.4141 | 0.3220 | 0.5166 | 1.0560 | 0.8366 | 0.7197 | 0.2834 | 0.2284 | 0.3670 | 0.6230 | 0.4638 | 0.6002 | 0.5170 | 0.4178 | 0.5325 | 0.6129 | 0.4619 | 0.6686 | 0.6071 | 0.4931 | 0.6729 | 0.5976 | 0.4475 | 0.5746 | 0.4468 | 0.3561 | 0.5211 | 0.7918 | 0.6129 |
1.7326 | 0.9343 | 6500 | 1.6849 | 0.4181 | 0.4129 | 0.3194 | 0.5143 | 1.0735 | 0.8559 | 0.4769 | 0.4135 | 0.3214 | 0.5143 | 1.0616 | 0.8366 | 0.7148 | 0.2880 | 0.2322 | 0.3705 | 0.6259 | 0.4798 | 0.6012 | 0.5174 | 0.4180 | 0.5310 | 0.6137 | 0.4678 | 0.6763 | 0.6069 | 0.4930 | 0.6736 | 0.5983 | 0.4485 | 0.5775 | 0.4477 | 0.3568 | 0.5207 | 0.7946 | 0.6177 |
Framework versions
- PEFT 0.15.0
- Transformers 4.50.0.dev0
- Pytorch 2.4.1+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.