--- license: gemma base_model: google/gemma-2-9b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-9b_hs2_replace_iter4_sftsd2 results: [] --- # collapse_gemma-2-9b_hs2_replace_iter4_sftsd2 This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.4790 - Num Input Tokens Seen: 4639036 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.2335 | 0 | | 1.2061 | 0.0511 | 5 | 1.0694 | 240324 | | 0.5247 | 0.1022 | 10 | 1.1930 | 482000 | | 0.2089 | 0.1533 | 15 | 1.3107 | 712380 | | 0.0939 | 0.2044 | 20 | 1.4465 | 957020 | | 0.0504 | 0.2555 | 25 | 1.5378 | 1188912 | | 0.0456 | 0.3066 | 30 | 1.4778 | 1428132 | | 0.0331 | 0.3577 | 35 | 1.4145 | 1677336 | | 0.0238 | 0.4088 | 40 | 1.4888 | 1914204 | | 0.0255 | 0.4599 | 45 | 1.5425 | 2146180 | | 0.0243 | 0.5110 | 50 | 1.5185 | 2379516 | | 0.0381 | 0.5621 | 55 | 1.4742 | 2619096 | | 0.0305 | 0.6132 | 60 | 1.4191 | 2862804 | | 0.0227 | 0.6643 | 65 | 1.4256 | 3103004 | | 0.021 | 0.7154 | 70 | 1.4350 | 3346964 | | 0.0279 | 0.7665 | 75 | 1.4590 | 3587168 | | 0.0242 | 0.8176 | 80 | 1.5009 | 3830384 | | 0.0262 | 0.8687 | 85 | 1.4784 | 4068408 | | 0.0244 | 0.9198 | 90 | 1.4782 | 4308452 | | 0.0228 | 0.9709 | 95 | 1.4777 | 4542732 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1