gemma-2-9b-evaluator-v1

This model is a fine-tuned version of google/gemma-2-9b-it on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.6849
Helpfulness Accuracy: 0.4181
Helpfulness Spearmanr: 0.4129
Helpfulness Kendalltau: 0.3194
Helpfulness Pearsonr: 0.5143
Helpfulness Rmse: 1.0735
Helpfulness Mae: 0.8559
Correctness Accuracy: 0.4769
Correctness Spearmanr: 0.4135
Correctness Kendalltau: 0.3214
Correctness Pearsonr: 0.5143
Correctness Rmse: 1.0616
Correctness Mae: 0.8366
Coherence Accuracy: 0.7148
Coherence Spearmanr: 0.2880
Coherence Kendalltau: 0.2322
Coherence Pearsonr: 0.3705
Coherence Rmse: 0.6259
Coherence Mae: 0.4798
Complexity Accuracy: 0.6012
Complexity Spearmanr: 0.5174
Complexity Kendalltau: 0.4180
Complexity Pearsonr: 0.5310
Complexity Rmse: 0.6137
Complexity Mae: 0.4678
Verbosity Accuracy: 0.6763
Verbosity Spearmanr: 0.6069
Verbosity Kendalltau: 0.4930
Verbosity Pearsonr: 0.6736
Verbosity Rmse: 0.5983
Verbosity Mae: 0.4485
Avg Accuracy: 0.5775
Avg Spearmanr: 0.4477
Avg Kendalltau: 0.3568
Avg Pearsonr: 0.5207
Avg Rmse: 0.7946
Avg Mae: 0.6177

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Helpfulness Accuracy	Helpfulness Spearmanr	Helpfulness Kendalltau	Helpfulness Pearsonr	Helpfulness Rmse	Helpfulness Mae	Correctness Accuracy	Correctness Spearmanr	Correctness Kendalltau	Correctness Pearsonr	Correctness Rmse	Correctness Mae	Coherence Accuracy	Coherence Spearmanr	Coherence Kendalltau	Coherence Pearsonr	Coherence Rmse	Coherence Mae	Complexity Accuracy	Complexity Spearmanr	Complexity Kendalltau	Complexity Pearsonr	Complexity Rmse	Complexity Mae	Verbosity Accuracy	Verbosity Spearmanr	Verbosity Kendalltau	Verbosity Pearsonr	Verbosity Rmse	Verbosity Mae	Avg Accuracy	Avg Spearmanr	Avg Kendalltau	Avg Pearsonr	Avg Rmse	Avg Mae
No log	0	0	7.3447	0.0934	-0.1708	-0.1290	-0.1865	2.6016	2.3248	0.2630	-0.0067	-0.0052	-0.0083	1.3643	1.1593	0.0530	-0.0417	-0.0336	-0.0564	3.0362	2.8994	0.0983	-0.1112	-0.0871	-0.1231	1.0970	0.8836	0.1378	-0.0086	-0.0068	-0.0139	1.2755	1.0756	0.1291	-0.0678	-0.0523	-0.0776	1.8749	1.6686
2.1437	0.0719	500	2.3464	0.3092	0.0421	0.0320	0.0373	1.2641	1.0162	0.3198	0.1021	0.0767	0.1458	1.2286	1.0016	0.5491	0.0236	0.0188	0.0441	0.7506	0.6140	0.4711	0.1512	0.1182	0.1430	0.7263	0.5881	0.5588	0.1844	0.1419	0.2079	0.7855	0.5510	0.4416	0.1007	0.0775	0.1156	0.9510	0.7542
1.843	0.1437	1000	1.9347	0.3998	0.3252	0.2481	0.3956	1.1578	0.9084	0.4403	0.3335	0.2557	0.4149	1.1291	0.8900	0.6628	0.1423	0.1139	0.1817	0.6701	0.5221	0.5607	0.3808	0.3022	0.4127	0.6576	0.5173	0.6243	0.4424	0.3485	0.4902	0.7009	0.4784	0.5376	0.3248	0.2537	0.3790	0.8631	0.6632
1.8118	0.2156	1500	1.8294	0.3738	0.3565	0.2739	0.4501	1.1206	0.8954	0.4316	0.3606	0.2784	0.4580	1.0958	0.8865	0.6830	0.1907	0.1529	0.2532	0.6581	0.5247	0.5848	0.4564	0.3656	0.4838	0.6362	0.4909	0.6368	0.5178	0.4127	0.5913	0.6608	0.4474	0.5420	0.3764	0.2967	0.4473	0.8343	0.6490
1.6318	0.2875	2000	1.7885	0.3882	0.3734	0.2886	0.4616	1.1076	0.8827	0.4701	0.3800	0.2947	0.4699	1.0894	0.8631	0.6965	0.2344	0.1883	0.2992	0.6412	0.4804	0.5867	0.4633	0.3721	0.4818	0.6373	0.4734	0.6320	0.5412	0.4338	0.6172	0.6365	0.4765	0.5547	0.3985	0.3155	0.4659	0.8224	0.6352
1.7944	0.3594	2500	1.7645	0.4046	0.3903	0.3016	0.4828	1.0943	0.8715	0.4576	0.3930	0.3053	0.4839	1.0783	0.8590	0.6840	0.2530	0.2035	0.3156	0.6434	0.5073	0.5838	0.4831	0.3894	0.4970	0.6277	0.4767	0.6474	0.5542	0.4453	0.6269	0.6427	0.4815	0.5555	0.4147	0.3290	0.4812	0.8173	0.6392
1.6247	0.4312	3000	1.7417	0.3545	0.3994	0.3085	0.4981	1.0894	0.8818	0.4672	0.4033	0.3128	0.5024	1.0650	0.8560	0.7110	0.2636	0.2123	0.3346	0.6330	0.4812	0.5915	0.4970	0.4012	0.5094	0.6237	0.4733	0.6532	0.5731	0.4623	0.6450	0.6289	0.4780	0.5555	0.4273	0.3394	0.4979	0.8080	0.6341
1.5258	0.5031	3500	1.7147	0.3950	0.4007	0.3098	0.4954	1.0861	0.8657	0.4730	0.4060	0.3154	0.4983	1.0706	0.8456	0.7148	0.2700	0.2174	0.3492	0.6280	0.4644	0.6040	0.5005	0.4037	0.5149	0.6216	0.4902	0.6590	0.5813	0.4698	0.6510	0.6043	0.4418	0.5692	0.4317	0.3432	0.5018	0.8021	0.6215
1.6146	0.5750	4000	1.7121	0.4114	0.4005	0.3095	0.5034	1.0830	0.8597	0.4769	0.4057	0.3152	0.5055	1.0649	0.8371	0.7225	0.2730	0.2200	0.3536	0.6249	0.4447	0.5954	0.5037	0.4064	0.5112	0.6232	0.4644	0.6638	0.5981	0.4850	0.6605	0.6168	0.4499	0.5740	0.4362	0.3472	0.5068	0.8026	0.6112
1.517	0.6468	4500	1.7076	0.4171	0.4057	0.3134	0.5066	1.0792	0.8598	0.4807	0.4093	0.3179	0.5092	1.0628	0.8370	0.7216	0.2781	0.2240	0.3628	0.6219	0.4552	0.5915	0.5043	0.4070	0.5095	0.6316	0.4612	0.6667	0.6012	0.4878	0.6676	0.6101	0.4619	0.5755	0.4397	0.3500	0.5111	0.8011	0.6150
1.6201	0.7187	5000	1.6972	0.4143	0.4073	0.3146	0.5110	1.0756	0.8582	0.4836	0.4090	0.3176	0.5121	1.0610	0.8329	0.7197	0.2816	0.2271	0.3668	0.6214	0.4611	0.5944	0.5126	0.4142	0.5222	0.6223	0.4624	0.6647	0.6072	0.4931	0.6724	0.6066	0.4627	0.5753	0.4435	0.3533	0.5169	0.7974	0.6155
1.6377	0.7906	5500	1.6846	0.4220	0.4099	0.3168	0.5128	1.0754	0.8550	0.4798	0.4111	0.3196	0.5127	1.0591	0.8381	0.7245	0.2843	0.2292	0.3684	0.6206	0.4495	0.6021	0.5131	0.4145	0.5276	0.6144	0.4601	0.6696	0.6092	0.4951	0.6752	0.5966	0.4445	0.5796	0.4455	0.3550	0.5193	0.7932	0.6095
1.619	0.8624	6000	1.6815	0.4027	0.4127	0.3192	0.5166	1.0698	0.8549	0.4817	0.4141	0.3220	0.5166	1.0560	0.8366	0.7197	0.2834	0.2284	0.3670	0.6230	0.4638	0.6002	0.5170	0.4178	0.5325	0.6129	0.4619	0.6686	0.6071	0.4931	0.6729	0.5976	0.4475	0.5746	0.4468	0.3561	0.5211	0.7918	0.6129
1.7326	0.9343	6500	1.6849	0.4181	0.4129	0.3194	0.5143	1.0735	0.8559	0.4769	0.4135	0.3214	0.5143	1.0616	0.8366	0.7148	0.2880	0.2322	0.3705	0.6259	0.4798	0.6012	0.5174	0.4180	0.5310	0.6137	0.4678	0.6763	0.6069	0.4930	0.6736	0.5983	0.4485	0.5775	0.4477	0.3568	0.5207	0.7946	0.6177

Framework versions

PEFT 0.15.0
Transformers 4.50.0.dev0
Pytorch 2.4.1+cu124
Datasets 3.4.1
Tokenizers 0.21.1

systemk
/

gemma-2-9b-evaluator-v1

gemma-2-9b-evaluator-v1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for systemk/gemma-2-9b-evaluator-v1

Evaluation results