Edens-Gate
/

tinymagnum-r2-KTO-r1-ood

Generated from Trainer

Model card Files Files and versions

tinymagnum-r2-KTO-r1

This model is a fine-tuned version of NewEden/trashdwag on the combined_kto.json dataset. It achieves the following results on the evaluation set:

Loss: 0.5003
Rewards/chosen: 0.0061
Logps/chosen: -12.0862
Rewards/rejected: 0.0023
Logps/rejected: -16.1405
Rewards/margins: 0.0039
Kl: 0.0447

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.25
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Rewards/rejected	Logps/rejected	Rewards/margins	Kl
0.5025	0.1078	16	0.5038	0.0004	-12.1438	0.0007	-16.1563	-0.0003	0.0099
0.502	0.2157	32	0.5019	0.0033	-12.1150	0.0018	-16.1450	0.0014	0.0200
0.5026	0.3235	48	0.5013	0.0051	-12.0964	0.0027	-16.1358	0.0024	0.0335
0.5021	0.4313	64	0.5015	0.0058	-12.0893	0.0036	-16.1270	0.0022	0.0406
0.5017	0.5392	80	0.5012	0.0064	-12.0833	0.0037	-16.1265	0.0027	0.0434
0.5003	0.6470	96	0.5007	0.0066	-12.0812	0.0032	-16.1311	0.0034	0.0431
0.4996	0.7548	112	0.5012	0.0063	-12.0846	0.0028	-16.1353	0.0035	0.0437
0.5077	0.8627	128	0.5005	0.0063	-12.0844	0.0026	-16.1374	0.0037	0.0433
0.5012	0.9705	144	0.5004	0.0064	-12.0837	0.0023	-16.1401	0.0041	0.0431

Framework versions

PEFT 0.12.0
Transformers 4.45.0.dev0
Pytorch 2.3.0a0+ebedce2
Datasets 2.20.0
Tokenizers 0.19.1

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Edens-Gate/tinymagnum-r2-KTO-r1-ood

Base model

IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml

Finetuned

Edens-Gate/Holland-attempt-X

Adapter

(1)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard