qwen_ce_entropy_0_0

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.2625
Rewards/chosen: -1.2622
Rewards/rejected: -1.3865
Rewards/accuracies: 0.5467
Rewards/margins: 0.1243
Logps/rejected: -1.3865
Logps/chosen: -1.2622
Logits/rejected: 0.0855
Logits/chosen: 0.0231

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.2909	0.2141	400	1.3232	-1.3229	-1.4423	0.5549	0.1194	-1.4423	-1.3229	0.3582	0.2751
1.2587	0.4282	800	1.2927	-1.2925	-1.4166	0.5504	0.1242	-1.4166	-1.2925	0.3302	0.2542
1.2174	0.6422	1200	1.2838	-1.2835	-1.4045	0.5490	0.1210	-1.4045	-1.2835	0.2712	0.2000
1.2991	0.8563	1600	1.2773	-1.2770	-1.3983	0.5467	0.1213	-1.3983	-1.2770	0.2519	0.1821
1.2615	1.0704	2000	1.2727	-1.2724	-1.3955	0.5490	0.1231	-1.3955	-1.2724	0.1950	0.1280
1.1889	1.2845	2400	1.2689	-1.2686	-1.3926	0.5475	0.1239	-1.3926	-1.2686	0.1649	0.0990
1.2782	1.4986	2800	1.2663	-1.2660	-1.3882	0.5482	0.1222	-1.3882	-1.2660	0.1472	0.0825
1.225	1.7127	3200	1.2649	-1.2646	-1.3872	0.5460	0.1226	-1.3872	-1.2646	0.1561	0.0901
1.1621	1.9267	3600	1.2636	-1.2633	-1.3851	0.5475	0.1218	-1.3851	-1.2633	0.1670	0.0991
1.1574	2.1408	4000	1.2633	-1.2630	-1.3882	0.5467	0.1252	-1.3882	-1.2630	0.1189	0.0543
1.1513	2.3549	4400	1.2630	-1.2627	-1.3868	0.5453	0.1241	-1.3868	-1.2627	0.1222	0.0567
1.1366	2.5690	4800	1.2624	-1.2622	-1.3866	0.5475	0.1244	-1.3866	-1.2622	0.1424	0.0753
1.1253	2.7831	5200	1.2627	-1.2624	-1.3865	0.5475	0.1241	-1.3865	-1.2624	0.1178	0.0528
1.1657	2.9972	5600	1.2625	-1.2622	-1.3865	0.5467	0.1243	-1.3865	-1.2622	0.0855	0.0231

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_ce_entropy_0_0

qwen_ce_entropy_0_0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_ce_entropy_0_0

Dataset used to train yakazimir/qwen_ce_entropy_0_0

Evaluation results