v1_5_mistral_full_1122

This model is a fine-tuned version of peiyi9979/math-shepherd-mistral-7b-prm on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2520
Accuracy: 0.9035
Precision: 0.8317
Recall: 0.7925
F1: 0.8116

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 765837
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.3603	0.0575	20	0.4322	0.7550	0.5189	0.9057	0.6598
0.3252	0.1151	40	0.3813	0.8663	0.825	0.6226	0.7097
0.3574	0.1726	60	0.3824	0.8045	0.5789	0.9340	0.7148
0.31	0.2301	80	0.3275	0.8614	0.7155	0.7830	0.7477
0.2803	0.2877	100	0.3611	0.8738	0.7723	0.7358	0.7536
0.3893	0.3452	120	0.3245	0.8416	0.6694	0.7830	0.7217
0.3624	0.4027	140	0.3172	0.8812	0.8372	0.6792	0.75
0.3081	0.4603	160	0.3283	0.8639	0.7742	0.6792	0.7236
0.242	0.5178	180	0.2907	0.8837	0.7658	0.8019	0.7834
0.2692	0.5753	200	0.2787	0.8911	0.7925	0.7925	0.7925
0.2866	0.6329	220	0.2675	0.8787	0.7478	0.8113	0.7783
0.27	0.6904	240	0.2702	0.9035	0.8317	0.7925	0.8116
0.3112	0.7479	260	0.2605	0.9059	0.8864	0.7358	0.8041
0.2032	0.8055	280	0.2700	0.9010	0.8587	0.7453	0.7980
0.2326	0.8630	300	0.2549	0.9059	0.8333	0.8019	0.8173
0.2714	0.9205	320	0.2511	0.9035	0.8317	0.7925	0.8116
0.2562	0.9781	340	0.2520	0.9035	0.8317	0.7925	0.8116

Framework versions

Transformers 4.46.0
Pytorch 2.4.0+cu118
Datasets 3.0.0
Tokenizers 0.20.1

mtzig
/

v1_5_mistral_full_1122

v1_5_mistral_full_1122

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mtzig/v1_5_mistral_full_1122

Evaluation results