qwen_unl_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.6475
Rewards/chosen: -1.3030
Rewards/rejected: -1.4992
Rewards/accuracies: 0.5712
Rewards/margins: 0.1962
Logps/rejected: -1.4992
Logps/chosen: -1.3030
Logits/rejected: 0.0833
Logits/chosen: 0.0165

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
1.6549	0.2141	400	1.6939	-1.3375	-1.4631	0.5564	0.1256	-1.4631	-1.3375	0.3664	0.2799
1.6692	0.4282	800	1.6718	-1.3151	-1.4532	0.5579	0.1381	-1.4532	-1.3151	0.3708	0.2889
1.6206	0.6422	1200	1.6640	-1.3083	-1.4522	0.5564	0.1438	-1.4522	-1.3083	0.3523	0.2714
1.6566	0.8563	1600	1.6600	-1.3096	-1.4585	0.5593	0.1488	-1.4585	-1.3096	0.3578	0.2764
1.7104	1.0704	2000	1.6553	-1.3006	-1.4569	0.5660	0.1563	-1.4569	-1.3006	0.2528	0.1781
1.6123	1.2845	2400	1.6521	-1.3029	-1.4743	0.5668	0.1713	-1.4743	-1.3029	0.1650	0.0956
1.6688	1.4986	2800	1.6486	-1.3000	-1.4729	0.5690	0.1729	-1.4729	-1.3000	0.1751	0.1050
1.6012	1.7127	3200	1.6495	-1.3009	-1.4722	0.5668	0.1713	-1.4722	-1.3009	0.2139	0.1401
1.5646	1.9267	3600	1.6478	-1.2987	-1.4778	0.5705	0.1791	-1.4778	-1.2987	0.1771	0.1052
1.5351	2.1408	4000	1.6470	-1.3020	-1.4952	0.5712	0.1932	-1.4952	-1.3020	0.1238	0.0547
1.5307	2.3549	4400	1.6469	-1.3054	-1.5043	0.5712	0.1988	-1.5043	-1.3054	0.0587	-0.0064
1.5433	2.5690	4800	1.6472	-1.3037	-1.5017	0.5727	0.1980	-1.5017	-1.3037	0.1609	0.0880
1.5671	2.7831	5200	1.6473	-1.3030	-1.4994	0.5720	0.1964	-1.4994	-1.3030	0.0927	0.0252
1.5482	2.9972	5600	1.6475	-1.3030	-1.4992	0.5712	0.1962	-1.4992	-1.3030	0.0833	0.0165

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_unl_entropy

qwen_unl_entropy

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_unl_entropy

Dataset used to train yakazimir/qwen_unl_entropy

Evaluation results