collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0882
- Num Input Tokens Seen: 10720872
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.593 | 0.0266 | 5 | 1.3326 | 282968 |
1.468 | 0.0531 | 10 | 1.2105 | 570240 |
1.2681 | 0.0797 | 15 | 1.1563 | 856832 |
1.2072 | 0.1063 | 20 | 1.1314 | 1140048 |
1.1683 | 0.1328 | 25 | 1.1132 | 1423424 |
1.0607 | 0.1594 | 30 | 1.1129 | 1709744 |
1.1077 | 0.1860 | 35 | 1.1118 | 2001304 |
1.0482 | 0.2126 | 40 | 1.1135 | 2294992 |
1.074 | 0.2391 | 45 | 1.1164 | 2584408 |
0.9352 | 0.2657 | 50 | 1.1174 | 2875416 |
0.8739 | 0.2923 | 55 | 1.1168 | 3154376 |
0.8673 | 0.3188 | 60 | 1.1216 | 3443824 |
0.8946 | 0.3454 | 65 | 1.1154 | 3729728 |
0.7916 | 0.3720 | 70 | 1.1306 | 4012624 |
0.9486 | 0.3985 | 75 | 1.1155 | 4302744 |
0.721 | 0.4251 | 80 | 1.1205 | 4583480 |
0.8319 | 0.4517 | 85 | 1.1200 | 4859656 |
0.6664 | 0.4782 | 90 | 1.1144 | 5141344 |
0.7822 | 0.5048 | 95 | 1.1131 | 5420960 |
0.736 | 0.5314 | 100 | 1.1124 | 5708552 |
0.8007 | 0.5580 | 105 | 1.1126 | 5987736 |
0.6431 | 0.5845 | 110 | 1.1073 | 6271504 |
0.6754 | 0.6111 | 115 | 1.1048 | 6559824 |
0.8061 | 0.6377 | 120 | 1.1066 | 6848192 |
0.7043 | 0.6642 | 125 | 1.1044 | 7131224 |
0.6619 | 0.6908 | 130 | 1.1028 | 7410960 |
0.6988 | 0.7174 | 135 | 1.0991 | 7699432 |
0.7132 | 0.7439 | 140 | 1.0989 | 7986208 |
0.6748 | 0.7705 | 145 | 1.0963 | 8274264 |
0.7033 | 0.7971 | 150 | 1.0959 | 8560800 |
0.7145 | 0.8236 | 155 | 1.0943 | 8847792 |
0.6951 | 0.8502 | 160 | 1.0928 | 9134464 |
0.6958 | 0.8768 | 165 | 1.0908 | 9420928 |
0.6325 | 0.9034 | 170 | 1.0915 | 9697944 |
0.6244 | 0.9299 | 175 | 1.0886 | 9987896 |
0.6517 | 0.9565 | 180 | 1.0902 | 10269648 |
0.7256 | 0.9831 | 185 | 1.0874 | 10553360 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2
Base model
google/gemma-2-2b