gemma-2-2b-evaluator-v2
This model is a fine-tuned version of google/gemma-2-2b-jpn-it on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.8410
- Helpfulness Accuracy: 0.5109
- Helpfulness Spearmanr: 0.5327
- Helpfulness Kendalltau: 0.3736
- Helpfulness Pearsonr: 0.5339
- Helpfulness Rmse: 0.5903
- Helpfulness Mae: 0.4713
- Correctness Accuracy: 0.6014
- Correctness Spearmanr: 0.5531
- Correctness Kendalltau: 0.3943
- Correctness Pearsonr: 0.5859
- Correctness Rmse: 0.5292
- Correctness Mae: 0.4221
- Coherence Accuracy: 0.6998
- Coherence Spearmanr: 0.4640
- Coherence Kendalltau: 0.3232
- Coherence Pearsonr: 0.4909
- Coherence Rmse: 0.5011
- Coherence Mae: 0.4405
- Complexity Accuracy: 0.6064
- Complexity Spearmanr: -0.0021
- Complexity Kendalltau: -0.0026
- Complexity Pearsonr: -0.0076
- Complexity Rmse: 0.3605
- Complexity Mae: 0.3205
- Verbosity Accuracy: 0.6362
- Verbosity Spearmanr: 0.4071
- Verbosity Kendalltau: 0.2757
- Verbosity Pearsonr: 0.3365
- Verbosity Rmse: 0.3724
- Verbosity Mae: 0.3068
- Avg Accuracy: 0.6109
- Avg Spearmanr: 0.3910
- Avg Kendalltau: 0.2728
- Avg Pearsonr: 0.3879
- Avg Rmse: 0.4707
- Avg Mae: 0.3922
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_min_lr
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Helpfulness Accuracy | Helpfulness Spearmanr | Helpfulness Kendalltau | Helpfulness Pearsonr | Helpfulness Rmse | Helpfulness Mae | Correctness Accuracy | Correctness Spearmanr | Correctness Kendalltau | Correctness Pearsonr | Correctness Rmse | Correctness Mae | Coherence Accuracy | Coherence Spearmanr | Coherence Kendalltau | Coherence Pearsonr | Coherence Rmse | Coherence Mae | Complexity Accuracy | Complexity Spearmanr | Complexity Kendalltau | Complexity Pearsonr | Complexity Rmse | Complexity Mae | Verbosity Accuracy | Verbosity Spearmanr | Verbosity Kendalltau | Verbosity Pearsonr | Verbosity Rmse | Verbosity Mae | Avg Accuracy | Avg Spearmanr | Avg Kendalltau | Avg Pearsonr | Avg Rmse | Avg Mae |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No log | 0 | 0 | 6.0772 | 0.3807 | 0.0234 | 0.0151 | 0.0297 | 0.9409 | 0.7463 | 0.1471 | 0.0071 | 0.0048 | 0.0298 | 1.3459 | 1.1590 | 0.3698 | 0.0093 | 0.0059 | 0.0060 | 1.2779 | 1.0608 | 0.0258 | -0.0438 | -0.0291 | -0.0464 | 2.1264 | 1.9972 | 0.3698 | -0.0471 | -0.0310 | -0.0500 | 1.0197 | 0.7998 | 0.2586 | -0.0102 | -0.0069 | -0.0062 | 1.3422 | 1.1526 |
1.2731 | 0.2094 | 500 | 1.2058 | 0.4404 | 0.1512 | 0.0986 | 0.1462 | 0.7513 | 0.6121 | 0.5338 | 0.1290 | 0.0836 | 0.1806 | 0.6719 | 0.5388 | 0.6700 | 0.1137 | 0.0764 | 0.1443 | 0.5356 | 0.4510 | 0.6054 | -0.2206 | -0.1464 | -0.0710 | 0.4624 | 0.4287 | 0.6243 | 0.0942 | 0.0622 | -0.0191 | 0.4919 | 0.4087 | 0.5748 | 0.0535 | 0.0349 | 0.0762 | 0.5826 | 0.4878 |
0.9212 | 0.4188 | 1000 | 0.9210 | 0.4980 | 0.4385 | 0.3020 | 0.4357 | 0.5932 | 0.4747 | 0.5944 | 0.4142 | 0.2897 | 0.4702 | 0.5473 | 0.4357 | 0.7068 | 0.3103 | 0.2104 | 0.3688 | 0.4894 | 0.4244 | 0.6054 | -0.2491 | -0.1664 | -0.2532 | 0.4269 | 0.3880 | 0.6362 | 0.2771 | 0.1860 | 0.1669 | 0.3959 | 0.3246 | 0.6082 | 0.2382 | 0.1643 | 0.2377 | 0.4905 | 0.4095 |
0.8859 | 0.6283 | 1500 | 0.8554 | 0.4911 | 0.5129 | 0.3572 | 0.5111 | 0.5972 | 0.4769 | 0.5755 | 0.5366 | 0.3813 | 0.5659 | 0.5430 | 0.4334 | 0.6958 | 0.4329 | 0.3006 | 0.4621 | 0.5038 | 0.4418 | 0.6054 | -0.1199 | -0.0813 | -0.1286 | 0.3882 | 0.3489 | 0.6362 | 0.3749 | 0.2544 | 0.2937 | 0.3931 | 0.3275 | 0.6008 | 0.3475 | 0.2424 | 0.3409 | 0.4851 | 0.4057 |
0.7737 | 0.8377 | 2000 | 0.8410 | 0.5109 | 0.5327 | 0.3736 | 0.5339 | 0.5903 | 0.4713 | 0.6014 | 0.5531 | 0.3943 | 0.5859 | 0.5292 | 0.4221 | 0.6998 | 0.4640 | 0.3232 | 0.4909 | 0.5011 | 0.4405 | 0.6064 | -0.0021 | -0.0026 | -0.0076 | 0.3605 | 0.3205 | 0.6362 | 0.4071 | 0.2757 | 0.3365 | 0.3724 | 0.3068 | 0.6109 | 0.3910 | 0.2728 | 0.3879 | 0.4707 | 0.3922 |
Framework versions
- PEFT 0.15.0
- Transformers 4.50.1
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no pipeline_tag.