collapse_gemma-2-9b_hs2_replace_iter3_sftsd2
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3331
- Num Input Tokens Seen: 4767740
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.1673 | 0.0519 | 5 | 1.0681 | 247836 |
0.6647 | 0.1037 | 10 | 1.1160 | 500564 |
0.3357 | 0.1556 | 15 | 1.2251 | 752148 |
0.1216 | 0.2075 | 20 | 1.2090 | 998756 |
0.0601 | 0.2593 | 25 | 1.2567 | 1249040 |
0.0575 | 0.3112 | 30 | 1.2783 | 1498900 |
0.0399 | 0.3630 | 35 | 1.2369 | 1750044 |
0.0626 | 0.4149 | 40 | 1.2392 | 2000384 |
0.0329 | 0.4668 | 45 | 1.1987 | 2257268 |
0.0283 | 0.5186 | 50 | 1.2369 | 2506816 |
0.0306 | 0.5705 | 55 | 1.2321 | 2749044 |
0.0274 | 0.6224 | 60 | 1.2139 | 3001308 |
0.039 | 0.6742 | 65 | 1.2486 | 3246720 |
0.0376 | 0.7261 | 70 | 1.2664 | 3490852 |
0.0312 | 0.7780 | 75 | 1.2641 | 3736420 |
0.04 | 0.8298 | 80 | 1.2473 | 3980484 |
0.0344 | 0.8817 | 85 | 1.3026 | 4227696 |
0.0329 | 0.9335 | 90 | 1.3221 | 4471696 |
0.0317 | 0.9854 | 95 | 1.3285 | 4718104 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_replace_iter3_sftsd2
Base model
google/gemma-2-9b