oh_scale_x.125_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.125x dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 89.0

Training results

Training Loss Epoch Step Validation Loss
0.8588 0.9973 47 0.8431
0.7685 1.9947 94 0.8078
0.7039 2.9920 141 0.8061
0.6431 3.9894 188 0.8146
0.6047 4.9867 235 0.8365
0.5574 5.9841 282 0.8701
0.5092 6.9814 329 0.8984
0.4572 8.0 377 0.9556
0.4085 8.9973 424 1.0193
0.349 9.9947 471 1.1014
0.2917 10.9920 518 1.1841
0.2371 11.9894 565 1.2766
0.1947 12.9867 612 1.4154
0.1574 13.9841 659 1.5165
0.1248 14.9814 706 1.6125
0.0949 16.0 754 1.7871
0.072 16.9973 801 1.8431
0.0557 17.9947 848 1.8931
0.0476 18.9920 895 1.8831
0.0389 19.9894 942 2.0265
0.0326 20.9867 989 2.0191
0.0289 21.9841 1036 2.0776
0.0241 22.9814 1083 2.1365
0.0224 24.0 1131 2.1633
0.0186 24.9973 1178 2.1493
0.0168 25.9947 1225 2.1881
0.0165 26.9920 1272 2.2118
0.0149 27.9894 1319 2.1890
0.0138 28.9867 1366 2.2228
0.0124 29.9841 1413 2.2381
0.0099 30.9814 1460 2.2632
0.0082 32.0 1508 2.3145
0.0074 32.9973 1555 2.3310
0.0063 33.9947 1602 2.2894
0.0058 34.9920 1649 2.3082
0.0051 35.9894 1696 2.3288
0.0048 36.9867 1743 2.3887
0.0047 37.9841 1790 2.3353
0.0046 38.9814 1837 2.3314
0.0046 40.0 1885 2.3529
0.0046 40.9973 1932 2.2960
0.0044 41.9947 1979 2.2470
0.0046 42.9920 2026 2.2445
0.0047 43.9894 2073 2.1857
0.0046 44.9867 2120 2.2821
0.0044 45.9841 2167 2.1947
0.0046 46.9814 2214 2.2448
0.0046 48.0 2262 2.2752
0.0045 48.9973 2309 2.1920
0.0043 49.9947 2356 2.2769
0.0046 50.9920 2403 2.1450
0.0047 51.9894 2450 2.1438
0.0045 52.9867 2497 2.2089
0.0046 53.9841 2544 2.1234
0.0043 54.9814 2591 2.0988
0.0042 56.0 2639 2.2262
0.0041 56.9973 2686 2.1830
0.0043 57.9947 2733 2.0565
0.0044 58.9920 2780 2.1350
0.0042 59.9894 2827 2.1475
0.004 60.9867 2874 2.1590
0.0039 61.9841 2921 2.1752
0.0043 62.9814 2968 2.0756
0.0038 64.0 3016 2.1629
0.0038 64.9973 3063 2.1522
0.0036 65.9947 3110 2.1449
0.0035 66.9920 3157 2.1889
0.0035 67.9894 3204 2.0248
0.0034 68.9867 3251 2.1538
0.0034 69.9841 3298 2.1202
0.0035 70.9814 3345 2.0326
0.0035 72.0 3393 2.1360
0.0036 72.9973 3440 2.1404
0.0036 73.9947 3487 2.0651
0.0035 74.9920 3534 2.0982
0.0033 75.9894 3581 2.1032
0.0034 76.9867 3628 2.1028
0.0032 77.9841 3675 2.1282
0.0031 78.9814 3722 2.0912
0.0035 80.0 3770 2.0766
0.0033 80.9973 3817 2.0286
0.0033 81.9947 3864 2.0421
0.0034 82.9920 3911 2.1121
0.0033 83.9894 3958 2.0832
0.0033 84.9867 4005 2.0629
0.0034 85.9841 4052 2.1398
0.0032 86.9814 4099 2.1203
0.0032 88.0 4147 2.1025
0.0035 88.7639 4183 2.0839

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.3.0
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
301
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for mlfoundations-dev/oh_scale_x.125_compute_equal

Finetuned
(869)
this model