---
license: gemma
base_model: google/gemma-2-9b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9546
- Num Input Tokens Seen: 24032428

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.2335          | 0                 |
| 1.3474        | 0.0106 | 5    | 1.1927          | 258792            |
| 1.2078        | 0.0212 | 10   | 1.0768          | 508764            |
| 0.9674        | 0.0317 | 15   | 1.0312          | 767600            |
| 0.9091        | 0.0423 | 20   | 1.0227          | 1023848           |
| 0.6911        | 0.0529 | 25   | 1.0203          | 1274344           |
| 0.4705        | 0.0635 | 30   | 1.0319          | 1532132           |
| 0.4228        | 0.0741 | 35   | 1.0292          | 1786368           |
| 0.3975        | 0.0846 | 40   | 1.0309          | 2042540           |
| 0.3206        | 0.0952 | 45   | 1.0153          | 2291720           |
| 0.2994        | 0.1058 | 50   | 1.0148          | 2549012           |
| 0.3297        | 0.1164 | 55   | 1.0050          | 2798620           |
| 0.3171        | 0.1270 | 60   | 1.0040          | 3056108           |
| 0.3139        | 0.1375 | 65   | 1.0020          | 3317448           |
| 0.3386        | 0.1481 | 70   | 0.9962          | 3574540           |
| 0.2501        | 0.1587 | 75   | 0.9942          | 3832272           |
| 0.2482        | 0.1693 | 80   | 0.9906          | 4083760           |
| 0.3098        | 0.1799 | 85   | 0.9875          | 4336164           |
| 0.2415        | 0.1904 | 90   | 0.9910          | 4592732           |
| 0.2895        | 0.2010 | 95   | 0.9856          | 4844652           |
| 0.3474        | 0.2116 | 100  | 0.9849          | 5098604           |
| 0.2472        | 0.2222 | 105  | 0.9823          | 5355028           |
| 0.2587        | 0.2328 | 110  | 0.9795          | 5604792           |
| 0.2691        | 0.2433 | 115  | 0.9779          | 5860556           |
| 0.2396        | 0.2539 | 120  | 0.9761          | 6117824           |
| 0.2505        | 0.2645 | 125  | 0.9776          | 6368016           |
| 0.2609        | 0.2751 | 130  | 0.9765          | 6626036           |
| 0.3553        | 0.2857 | 135  | 0.9746          | 6883312           |
| 0.2906        | 0.2962 | 140  | 0.9750          | 7139620           |
| 0.2989        | 0.3068 | 145  | 0.9738          | 7392496           |
| 0.3201        | 0.3174 | 150  | 0.9707          | 7646420           |
| 0.2327        | 0.3280 | 155  | 0.9708          | 7896552           |
| 0.281         | 0.3386 | 160  | 0.9712          | 8147848           |
| 0.291         | 0.3491 | 165  | 0.9701          | 8401936           |
| 0.3371        | 0.3597 | 170  | 0.9699          | 8654404           |
| 0.1926        | 0.3703 | 175  | 0.9703          | 8904204           |
| 0.286         | 0.3809 | 180  | 0.9703          | 9158204           |
| 0.2423        | 0.3915 | 185  | 0.9669          | 9411824           |
| 0.245         | 0.4020 | 190  | 0.9668          | 9665800           |
| 0.2902        | 0.4126 | 195  | 0.9697          | 9920256           |
| 0.2895        | 0.4232 | 200  | 0.9675          | 10172112          |
| 0.2431        | 0.4338 | 205  | 0.9671          | 10423820          |
| 0.286         | 0.4444 | 210  | 0.9665          | 10685328          |
| 0.3157        | 0.4549 | 215  | 0.9656          | 10942652          |
| 0.225         | 0.4655 | 220  | 0.9658          | 11199576          |
| 0.2655        | 0.4761 | 225  | 0.9654          | 11458748          |
| 0.2338        | 0.4867 | 230  | 0.9646          | 11713636          |
| 0.2768        | 0.4973 | 235  | 0.9647          | 11970092          |
| 0.2008        | 0.5078 | 240  | 0.9649          | 12215860          |
| 0.2491        | 0.5184 | 245  | 0.9634          | 12470476          |
| 0.2654        | 0.5290 | 250  | 0.9622          | 12726180          |
| 0.233         | 0.5396 | 255  | 0.9652          | 12977864          |
| 0.2297        | 0.5502 | 260  | 0.9652          | 13228988          |
| 0.2123        | 0.5607 | 265  | 0.9641          | 13487916          |
| 0.3055        | 0.5713 | 270  | 0.9622          | 13742944          |
| 0.252         | 0.5819 | 275  | 0.9627          | 14001064          |
| 0.2156        | 0.5925 | 280  | 0.9633          | 14257372          |
| 0.2373        | 0.6031 | 285  | 0.9630          | 14515772          |
| 0.2533        | 0.6136 | 290  | 0.9633          | 14773828          |
| 0.3101        | 0.6242 | 295  | 0.9634          | 15032732          |
| 0.2549        | 0.6348 | 300  | 0.9640          | 15287912          |
| 0.2208        | 0.6454 | 305  | 0.9621          | 15542376          |
| 0.2164        | 0.6560 | 310  | 0.9607          | 15798084          |
| 0.1831        | 0.6665 | 315  | 0.9612          | 16052724          |
| 0.2364        | 0.6771 | 320  | 0.9615          | 16306560          |
| 0.2993        | 0.6877 | 325  | 0.9622          | 16558416          |
| 0.2002        | 0.6983 | 330  | 0.9607          | 16807372          |
| 0.1973        | 0.7089 | 335  | 0.9597          | 17064900          |
| 0.3301        | 0.7194 | 340  | 0.9594          | 17318932          |
| 0.319         | 0.7300 | 345  | 0.9598          | 17577108          |
| 0.2533        | 0.7406 | 350  | 0.9579          | 17826528          |
| 0.1998        | 0.7512 | 355  | 0.9565          | 18081976          |
| 0.2274        | 0.7618 | 360  | 0.9560          | 18343304          |
| 0.2253        | 0.7723 | 365  | 0.9567          | 18596660          |
| 0.2473        | 0.7829 | 370  | 0.9566          | 18850916          |
| 0.2654        | 0.7935 | 375  | 0.9565          | 19111356          |
| 0.2053        | 0.8041 | 380  | 0.9557          | 19366576          |
| 0.2462        | 0.8147 | 385  | 0.9549          | 19620052          |
| 0.2217        | 0.8252 | 390  | 0.9573          | 19876608          |
| 0.23          | 0.8358 | 395  | 0.9586          | 20129892          |
| 0.2487        | 0.8464 | 400  | 0.9563          | 20384164          |
| 0.1914        | 0.8570 | 405  | 0.9562          | 20634768          |
| 0.2452        | 0.8676 | 410  | 0.9581          | 20891764          |
| 0.1935        | 0.8781 | 415  | 0.9572          | 21139492          |
| 0.3047        | 0.8887 | 420  | 0.9544          | 21396976          |
| 0.2257        | 0.8993 | 425  | 0.9555          | 21644644          |
| 0.2405        | 0.9099 | 430  | 0.9558          | 21891880          |
| 0.2522        | 0.9205 | 435  | 0.9547          | 22153024          |
| 0.2481        | 0.9310 | 440  | 0.9527          | 22404200          |
| 0.2242        | 0.9416 | 445  | 0.9527          | 22657000          |
| 0.3352        | 0.9522 | 450  | 0.9527          | 22911872          |
| 0.1884        | 0.9628 | 455  | 0.9540          | 23163792          |
| 0.2011        | 0.9734 | 460  | 0.9537          | 23423212          |
| 0.1947        | 0.9839 | 465  | 0.9525          | 23677144          |
| 0.29          | 0.9945 | 470  | 0.9534          | 23929996          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1