li-muyang
/

zephyr-8b-dpo-full

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zephyr-8b-dpo-full

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5060
Rewards/chosen: -0.9456
Rewards/rejected: -1.8257
Rewards/accuracies: 0.7579
Rewards/margins: 0.8801
Logps/rejected: -444.3302
Logps/chosen: -382.1980
Logits/rejected: 0.8653
Logits/chosen: 0.4899

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6703	0.1047	100	0.6695	0.0173	-0.0408	0.6825	0.0581	-265.8378	-285.9061	-0.4757	-0.5813
0.5922	0.2093	200	0.5902	-0.4616	-0.8504	0.7063	0.3888	-346.7961	-333.7917	-0.6443	-0.7411
0.5592	0.3140	300	0.5462	-0.6144	-1.2154	0.7421	0.6010	-383.3018	-349.0777	-0.2679	-0.4330
0.5461	0.4186	400	0.5323	-0.7030	-1.3568	0.7381	0.6539	-397.4421	-357.9295	-0.0100	-0.2412
0.5211	0.5233	500	0.5215	-1.0874	-1.8737	0.7341	0.7863	-449.1320	-396.3762	0.5346	0.2433
0.4932	0.6279	600	0.5180	-0.7257	-1.4962	0.7540	0.7705	-411.3827	-360.2088	0.4235	0.1246
0.4891	0.7326	700	0.5097	-0.9618	-1.8012	0.7579	0.8394	-441.8806	-383.8190	0.7266	0.3793
0.5052	0.8373	800	0.5067	-0.9279	-1.7930	0.7540	0.8651	-441.0578	-380.4258	0.8224	0.4548
0.4946	0.9419	900	0.5060	-0.9456	-1.8257	0.7579	0.8801	-444.3302	-382.1980	0.8653	0.4899

Framework versions

Transformers 4.45.0
Pytorch 2.2.2+rocm5.7
Datasets 3.2.0
Tokenizers 0.20.3

Downloads last month: 30

Safetensors

Model size

8.03B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for li-muyang/zephyr-8b-dpo-full

Quantizations

1 model

Evaluation results

Metadata error: specify a dataset to view leaderboard