gemma-2-2b-evaluator-v2

This model is a fine-tuned version of google/gemma-2-2b-jpn-it on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8410
Helpfulness Accuracy: 0.5109
Helpfulness Spearmanr: 0.5327
Helpfulness Kendalltau: 0.3736
Helpfulness Pearsonr: 0.5339
Helpfulness Rmse: 0.5903
Helpfulness Mae: 0.4713
Correctness Accuracy: 0.6014
Correctness Spearmanr: 0.5531
Correctness Kendalltau: 0.3943
Correctness Pearsonr: 0.5859
Correctness Rmse: 0.5292
Correctness Mae: 0.4221
Coherence Accuracy: 0.6998
Coherence Spearmanr: 0.4640
Coherence Kendalltau: 0.3232
Coherence Pearsonr: 0.4909
Coherence Rmse: 0.5011
Coherence Mae: 0.4405
Complexity Accuracy: 0.6064
Complexity Spearmanr: -0.0021
Complexity Kendalltau: -0.0026
Complexity Pearsonr: -0.0076
Complexity Rmse: 0.3605
Complexity Mae: 0.3205
Verbosity Accuracy: 0.6362
Verbosity Spearmanr: 0.4071
Verbosity Kendalltau: 0.2757
Verbosity Pearsonr: 0.3365
Verbosity Rmse: 0.3724
Verbosity Mae: 0.3068
Avg Accuracy: 0.6109
Avg Spearmanr: 0.3910
Avg Kendalltau: 0.2728
Avg Pearsonr: 0.3879
Avg Rmse: 0.4707
Avg Mae: 0.3922

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Helpfulness Accuracy	Helpfulness Spearmanr	Helpfulness Kendalltau	Helpfulness Pearsonr	Helpfulness Rmse	Helpfulness Mae	Correctness Accuracy	Correctness Spearmanr	Correctness Kendalltau	Correctness Pearsonr	Correctness Rmse	Correctness Mae	Coherence Accuracy	Coherence Spearmanr	Coherence Kendalltau	Coherence Pearsonr	Coherence Rmse	Coherence Mae	Complexity Accuracy	Complexity Spearmanr	Complexity Kendalltau	Complexity Pearsonr	Complexity Rmse	Complexity Mae	Verbosity Accuracy	Verbosity Spearmanr	Verbosity Kendalltau	Verbosity Pearsonr	Verbosity Rmse	Verbosity Mae	Avg Accuracy	Avg Spearmanr	Avg Kendalltau	Avg Pearsonr	Avg Rmse	Avg Mae
No log	0	0	6.0772	0.3807	0.0234	0.0151	0.0297	0.9409	0.7463	0.1471	0.0071	0.0048	0.0298	1.3459	1.1590	0.3698	0.0093	0.0059	0.0060	1.2779	1.0608	0.0258	-0.0438	-0.0291	-0.0464	2.1264	1.9972	0.3698	-0.0471	-0.0310	-0.0500	1.0197	0.7998	0.2586	-0.0102	-0.0069	-0.0062	1.3422	1.1526
1.2731	0.2094	500	1.2058	0.4404	0.1512	0.0986	0.1462	0.7513	0.6121	0.5338	0.1290	0.0836	0.1806	0.6719	0.5388	0.6700	0.1137	0.0764	0.1443	0.5356	0.4510	0.6054	-0.2206	-0.1464	-0.0710	0.4624	0.4287	0.6243	0.0942	0.0622	-0.0191	0.4919	0.4087	0.5748	0.0535	0.0349	0.0762	0.5826	0.4878
0.9212	0.4188	1000	0.9210	0.4980	0.4385	0.3020	0.4357	0.5932	0.4747	0.5944	0.4142	0.2897	0.4702	0.5473	0.4357	0.7068	0.3103	0.2104	0.3688	0.4894	0.4244	0.6054	-0.2491	-0.1664	-0.2532	0.4269	0.3880	0.6362	0.2771	0.1860	0.1669	0.3959	0.3246	0.6082	0.2382	0.1643	0.2377	0.4905	0.4095
0.8859	0.6283	1500	0.8554	0.4911	0.5129	0.3572	0.5111	0.5972	0.4769	0.5755	0.5366	0.3813	0.5659	0.5430	0.4334	0.6958	0.4329	0.3006	0.4621	0.5038	0.4418	0.6054	-0.1199	-0.0813	-0.1286	0.3882	0.3489	0.6362	0.3749	0.2544	0.2937	0.3931	0.3275	0.6008	0.3475	0.2424	0.3409	0.4851	0.4057
0.7737	0.8377	2000	0.8410	0.5109	0.5327	0.3736	0.5339	0.5903	0.4713	0.6014	0.5531	0.3943	0.5859	0.5292	0.4221	0.6998	0.4640	0.3232	0.4909	0.5011	0.4405	0.6064	-0.0021	-0.0026	-0.0076	0.3605	0.3205	0.6362	0.4071	0.2757	0.3365	0.3724	0.3068	0.6109	0.3910	0.2728	0.3879	0.4707	0.3922

Framework versions

PEFT 0.15.0
Transformers 4.50.1
Pytorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.1

systemk
/

estimation-helpsteer-gemma-2-2b

You need to agree to share your contact information to access this model

gemma-2-2b-evaluator-v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for systemk/estimation-helpsteer-gemma-2-2b

Evaluation results