oh_scale_x.125_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.125x dataset. It achieves the following results on the evaluation set:

Loss: 2.0839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 512
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 89.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.8588	0.9973	47	0.8431
0.7685	1.9947	94	0.8078
0.7039	2.9920	141	0.8061
0.6431	3.9894	188	0.8146
0.6047	4.9867	235	0.8365
0.5574	5.9841	282	0.8701
0.5092	6.9814	329	0.8984
0.4572	8.0	377	0.9556
0.4085	8.9973	424	1.0193
0.349	9.9947	471	1.1014
0.2917	10.9920	518	1.1841
0.2371	11.9894	565	1.2766
0.1947	12.9867	612	1.4154
0.1574	13.9841	659	1.5165
0.1248	14.9814	706	1.6125
0.0949	16.0	754	1.7871
0.072	16.9973	801	1.8431
0.0557	17.9947	848	1.8931
0.0476	18.9920	895	1.8831
0.0389	19.9894	942	2.0265
0.0326	20.9867	989	2.0191
0.0289	21.9841	1036	2.0776
0.0241	22.9814	1083	2.1365
0.0224	24.0	1131	2.1633
0.0186	24.9973	1178	2.1493
0.0168	25.9947	1225	2.1881
0.0165	26.9920	1272	2.2118
0.0149	27.9894	1319	2.1890
0.0138	28.9867	1366	2.2228
0.0124	29.9841	1413	2.2381
0.0099	30.9814	1460	2.2632
0.0082	32.0	1508	2.3145
0.0074	32.9973	1555	2.3310
0.0063	33.9947	1602	2.2894
0.0058	34.9920	1649	2.3082
0.0051	35.9894	1696	2.3288
0.0048	36.9867	1743	2.3887
0.0047	37.9841	1790	2.3353
0.0046	38.9814	1837	2.3314
0.0046	40.0	1885	2.3529
0.0046	40.9973	1932	2.2960
0.0044	41.9947	1979	2.2470
0.0046	42.9920	2026	2.2445
0.0047	43.9894	2073	2.1857
0.0046	44.9867	2120	2.2821
0.0044	45.9841	2167	2.1947
0.0046	46.9814	2214	2.2448
0.0046	48.0	2262	2.2752
0.0045	48.9973	2309	2.1920
0.0043	49.9947	2356	2.2769
0.0046	50.9920	2403	2.1450
0.0047	51.9894	2450	2.1438
0.0045	52.9867	2497	2.2089
0.0046	53.9841	2544	2.1234
0.0043	54.9814	2591	2.0988
0.0042	56.0	2639	2.2262
0.0041	56.9973	2686	2.1830
0.0043	57.9947	2733	2.0565
0.0044	58.9920	2780	2.1350
0.0042	59.9894	2827	2.1475
0.004	60.9867	2874	2.1590
0.0039	61.9841	2921	2.1752
0.0043	62.9814	2968	2.0756
0.0038	64.0	3016	2.1629
0.0038	64.9973	3063	2.1522
0.0036	65.9947	3110	2.1449
0.0035	66.9920	3157	2.1889
0.0035	67.9894	3204	2.0248
0.0034	68.9867	3251	2.1538
0.0034	69.9841	3298	2.1202
0.0035	70.9814	3345	2.0326
0.0035	72.0	3393	2.1360
0.0036	72.9973	3440	2.1404
0.0036	73.9947	3487	2.0651
0.0035	74.9920	3534	2.0982
0.0033	75.9894	3581	2.1032
0.0034	76.9867	3628	2.1028
0.0032	77.9841	3675	2.1282
0.0031	78.9814	3722	2.0912
0.0035	80.0	3770	2.0766
0.0033	80.9973	3817	2.0286
0.0033	81.9947	3864	2.0421
0.0034	82.9920	3911	2.1121
0.0033	83.9894	3958	2.0832
0.0033	84.9867	4005	2.0629
0.0034	85.9841	4052	2.1398
0.0032	86.9814	4099	2.1203
0.0032	88.0	4147	2.1025
0.0035	88.7639	4183	2.0839

Framework versions

Transformers 4.46.1
Pytorch 2.3.0
Datasets 3.1.0
Tokenizers 0.20.3

mlfoundations-dev
/

oh_scale_x.125_compute_equal

oh_scale_x.125_compute_equal

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mlfoundations-dev/oh_scale_x.125_compute_equal

Evaluation results