Model save

Browse files

Files changed (6) hide show

README.md +22 -19
adapter_model.safetensors +1 -1
all_results.json +7 -12
runs/May23_00-47-13_deep-diver-main-splendid-ape-1-0-0/events.out.tfevents.1716439886.deep-diver-main-splendid-ape-1-0-0.385.0 +2 -2
train_results.json +7 -7
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -2,13 +2,12 @@
 license: gemma
 library_name: peft
 tags:
-- alignment-handbook
 - trl
 - sft
 - generated_from_trainer
 base_model: google/gemma-7b
 datasets:
-- llama-duo/synth_summarize_dataset
 model-index:
 - name: gemma7b-summarize-gpt4o-80k
   results: []
@@ -17,12 +16,11 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/chansung18/huggingface/runs/oor99p6r)
 # gemma7b-summarize-gpt4o-80k
-This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the llama-duo/synth_summarize_dataset dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.9801
 ## Model description
@@ -53,28 +51,33 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 10
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.9152        | 0.9982 | 275  | 2.1950          |
-| 0.8104        | 2.0    | 551  | 2.1405          |
-| 0.7914        | 2.9982 | 826  | 2.1592          |
-| 0.6978        | 4.0    | 1102 | 2.2176          |
-| 0.6386        | 4.9982 | 1377 | 2.3272          |
-| 0.5725        | 6.0    | 1653 | 2.4713          |
-| 0.5089        | 6.9982 | 1928 | 2.6491          |
-| 0.4678        | 8.0    | 2204 | 2.8434          |
-| 0.433         | 8.9982 | 2479 | 2.9604          |
-| 0.4229        | 9.9819 | 2750 | 2.9801          |
 ### Framework versions
 - PEFT 0.11.1
-- Transformers 4.41.0
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.1
 - Tokenizers 0.19.1

 license: gemma
 library_name: peft
 tags:
 - trl
 - sft
 - generated_from_trainer
 base_model: google/gemma-7b
 datasets:
+- generator
 model-index:
 - name: gemma7b-summarize-gpt4o-80k
   results: []
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # gemma7b-summarize-gpt4o-80k
+This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 5.1111
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 15
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.2272        | 1.0   | 111  | 2.3900          |
+| 0.9374        | 2.0   | 222  | 2.1928          |
+| 0.8471        | 3.0   | 333  | 2.1682          |
+| 0.7873        | 4.0   | 444  | 2.2036          |
+| 0.685         | 5.0   | 555  | 2.2977          |
+| 0.6223        | 6.0   | 666  | 2.4441          |
+| 0.5378        | 7.0   | 777  | 2.6715          |
+| 0.458         | 8.0   | 888  | 2.9555          |
+| 0.3843        | 9.0   | 999  | 3.4365          |
+| 0.3241        | 10.0  | 1110 | 3.8823          |
+| 0.2825        | 11.0  | 1221 | 4.4044          |
+| 0.2549        | 12.0  | 1332 | 4.8382          |
+| 0.2408        | 13.0  | 1443 | 5.0611          |
+| 0.2361        | 14.0  | 1554 | 5.1061          |
+| 0.2319        | 15.0  | 1665 | 5.1111          |
 ### Framework versions
 - PEFT 0.11.1
+- Transformers 4.41.1
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.1
 - Tokenizers 0.19.1

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b15502d373cb04d0f86a5ca9b0cea5a308a9d1f1a4a5c15bd79ea7ceab840962
 size 50056096

 version https://git-lfs.github.com/spec/v1
+oid sha256:8b39892ff18d9752a14de56307028f4e58d076eb64b000d907b7bb34d6a3a848
 size 50056096

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
-    "epoch": 9.98185117967332,
-    "eval_loss": 2.9801077842712402,
-    "eval_runtime": 1.0531,
-    "eval_samples": 25,
-    "eval_samples_per_second": 4.748,
-    "eval_steps_per_second": 1.899,
-    "total_flos": 4.2044012259841147e+18,
-    "train_loss": 1.3294648733139038,
-    "train_runtime": 22350.6599,
-    "train_samples": 81423,
-    "train_samples_per_second": 1.971,
-    "train_steps_per_second": 0.123
 }

 {
+    "epoch": 15.0,
+    "total_flos": 2.545573832974926e+18,
+    "train_loss": 1.4823458194016694,
+    "train_runtime": 13248.7618,
+    "train_samples": 32782,
+    "train_samples_per_second": 2.007,
+    "train_steps_per_second": 0.126
 }

runs/May23_00-47-13_deep-diver-main-splendid-ape-1-0-0/events.out.tfevents.1716439886.deep-diver-main-splendid-ape-1-0-0.385.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:62cac5ad4a473ada4d047f2c9775605a6533965882baad0b6e047eb1c5af01dc
-size 77008

 version https://git-lfs.github.com/spec/v1
+oid sha256:1c11320a0f319a9d2ffc4f2c2b06a95261097f63f217b51b392b5f6cd7bf1480
+size 80376

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 9.98185117967332,
-    "total_flos": 4.2044012259841147e+18,
-    "train_loss": 1.3294648733139038,
-    "train_runtime": 22350.6599,
-    "train_samples": 81423,
-    "train_samples_per_second": 1.971,
-    "train_steps_per_second": 0.123
 }

 {
+    "epoch": 15.0,
+    "total_flos": 2.545573832974926e+18,
+    "train_loss": 1.4823458194016694,
+    "train_runtime": 13248.7618,
+    "train_samples": 32782,
+    "train_samples_per_second": 2.007,
+    "train_steps_per_second": 0.126
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff