chansung commited on
Commit
085be5b
1 Parent(s): 1170f25

Model save

Browse files
README.md CHANGED
@@ -2,13 +2,12 @@
2
  license: gemma
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  base_model: google/gemma-7b
10
  datasets:
11
- - llama-duo/synth_summarize_dataset
12
  model-index:
13
  - name: gemma7b-summarize-gpt4o-80k
14
  results: []
@@ -17,12 +16,11 @@ model-index:
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
 
20
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/chansung18/huggingface/runs/oor99p6r)
21
  # gemma7b-summarize-gpt4o-80k
22
 
23
- This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the llama-duo/synth_summarize_dataset dataset.
24
  It achieves the following results on the evaluation set:
25
- - Loss: 2.9801
26
 
27
  ## Model description
28
 
@@ -53,28 +51,33 @@ The following hyperparameters were used during training:
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: cosine
55
  - lr_scheduler_warmup_ratio: 0.1
56
- - num_epochs: 10
57
 
58
  ### Training results
59
 
60
- | Training Loss | Epoch | Step | Validation Loss |
61
- |:-------------:|:------:|:----:|:---------------:|
62
- | 0.9152 | 0.9982 | 275 | 2.1950 |
63
- | 0.8104 | 2.0 | 551 | 2.1405 |
64
- | 0.7914 | 2.9982 | 826 | 2.1592 |
65
- | 0.6978 | 4.0 | 1102 | 2.2176 |
66
- | 0.6386 | 4.9982 | 1377 | 2.3272 |
67
- | 0.5725 | 6.0 | 1653 | 2.4713 |
68
- | 0.5089 | 6.9982 | 1928 | 2.6491 |
69
- | 0.4678 | 8.0 | 2204 | 2.8434 |
70
- | 0.433 | 8.9982 | 2479 | 2.9604 |
71
- | 0.4229 | 9.9819 | 2750 | 2.9801 |
 
 
 
 
 
72
 
73
 
74
  ### Framework versions
75
 
76
  - PEFT 0.11.1
77
- - Transformers 4.41.0
78
  - Pytorch 2.3.0+cu121
79
  - Datasets 2.19.1
80
  - Tokenizers 0.19.1
 
2
  license: gemma
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  base_model: google/gemma-7b
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: gemma7b-summarize-gpt4o-80k
13
  results: []
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
 
19
  # gemma7b-summarize-gpt4o-80k
20
 
21
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 5.1111
24
 
25
  ## Model description
26
 
 
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 15
55
 
56
  ### Training results
57
 
58
+ | Training Loss | Epoch | Step | Validation Loss |
59
+ |:-------------:|:-----:|:----:|:---------------:|
60
+ | 1.2272 | 1.0 | 111 | 2.3900 |
61
+ | 0.9374 | 2.0 | 222 | 2.1928 |
62
+ | 0.8471 | 3.0 | 333 | 2.1682 |
63
+ | 0.7873 | 4.0 | 444 | 2.2036 |
64
+ | 0.685 | 5.0 | 555 | 2.2977 |
65
+ | 0.6223 | 6.0 | 666 | 2.4441 |
66
+ | 0.5378 | 7.0 | 777 | 2.6715 |
67
+ | 0.458 | 8.0 | 888 | 2.9555 |
68
+ | 0.3843 | 9.0 | 999 | 3.4365 |
69
+ | 0.3241 | 10.0 | 1110 | 3.8823 |
70
+ | 0.2825 | 11.0 | 1221 | 4.4044 |
71
+ | 0.2549 | 12.0 | 1332 | 4.8382 |
72
+ | 0.2408 | 13.0 | 1443 | 5.0611 |
73
+ | 0.2361 | 14.0 | 1554 | 5.1061 |
74
+ | 0.2319 | 15.0 | 1665 | 5.1111 |
75
 
76
 
77
  ### Framework versions
78
 
79
  - PEFT 0.11.1
80
+ - Transformers 4.41.1
81
  - Pytorch 2.3.0+cu121
82
  - Datasets 2.19.1
83
  - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b15502d373cb04d0f86a5ca9b0cea5a308a9d1f1a4a5c15bd79ea7ceab840962
3
  size 50056096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b39892ff18d9752a14de56307028f4e58d076eb64b000d907b7bb34d6a3a848
3
  size 50056096
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 9.98185117967332,
3
- "eval_loss": 2.9801077842712402,
4
- "eval_runtime": 1.0531,
5
- "eval_samples": 25,
6
- "eval_samples_per_second": 4.748,
7
- "eval_steps_per_second": 1.899,
8
- "total_flos": 4.2044012259841147e+18,
9
- "train_loss": 1.3294648733139038,
10
- "train_runtime": 22350.6599,
11
- "train_samples": 81423,
12
- "train_samples_per_second": 1.971,
13
- "train_steps_per_second": 0.123
14
  }
 
1
  {
2
+ "epoch": 15.0,
3
+ "total_flos": 2.545573832974926e+18,
4
+ "train_loss": 1.4823458194016694,
5
+ "train_runtime": 13248.7618,
6
+ "train_samples": 32782,
7
+ "train_samples_per_second": 2.007,
8
+ "train_steps_per_second": 0.126
 
 
 
 
 
9
  }
runs/May23_00-47-13_deep-diver-main-splendid-ape-1-0-0/events.out.tfevents.1716439886.deep-diver-main-splendid-ape-1-0-0.385.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:62cac5ad4a473ada4d047f2c9775605a6533965882baad0b6e047eb1c5af01dc
3
- size 77008
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c11320a0f319a9d2ffc4f2c2b06a95261097f63f217b51b392b5f6cd7bf1480
3
+ size 80376
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 9.98185117967332,
3
- "total_flos": 4.2044012259841147e+18,
4
- "train_loss": 1.3294648733139038,
5
- "train_runtime": 22350.6599,
6
- "train_samples": 81423,
7
- "train_samples_per_second": 1.971,
8
- "train_steps_per_second": 0.123
9
  }
 
1
  {
2
+ "epoch": 15.0,
3
+ "total_flos": 2.545573832974926e+18,
4
+ "train_loss": 1.4823458194016694,
5
+ "train_runtime": 13248.7618,
6
+ "train_samples": 32782,
7
+ "train_samples_per_second": 2.007,
8
+ "train_steps_per_second": 0.126
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff