FatCat87 commited on
Commit
5d55404
·
verified ·
1 Parent(s): b253558

End of training

Browse files
Files changed (2) hide show
  1. README.md +23 -30
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - generated_from_trainer
6
  base_model: mhenrichsen/gemma-7b
7
  model-index:
8
- - name: test-task-2025-01-06
9
  results: []
10
  ---
11
 
@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  axolotl version: `0.4.1`
19
  ```yaml
20
- adapter: qlora
21
  base_model: mhenrichsen/gemma-7b
22
  bf16: auto
23
  datasets:
@@ -34,13 +34,13 @@ flash_attention: true
34
  fp16: null
35
  fsdp: null
36
  fsdp_config: null
37
- gradient_accumulation_steps: 3
38
  gradient_checkpointing: true
39
  group_by_length: false
40
- hub_model_id: FatCat87/test-task-2025-01-06
41
  learning_rate: 0.0002
42
- load_in_4bit: true
43
- load_in_8bit: false
44
  local_rank: null
45
  logging_steps: 1
46
  lora_alpha: 16
@@ -50,7 +50,7 @@ lora_target_linear: true
50
  lr_scheduler: cosine
51
  micro_batch_size: 2
52
  model_type: AutoModelForCausalLM
53
- num_epochs: 4
54
  optimizer: adamw_bnb_8bit
55
  output_dir: ./outputs/out
56
  pad_to_sequence_len: true
@@ -67,9 +67,9 @@ val_set_size: 0.1
67
  wandb_entity: fatcat87-taopanda
68
  wandb_log_model: null
69
  wandb_mode: online
70
- wandb_name: test-task-2025-01-06
71
  wandb_project: subnet56
72
- wandb_runid: test-task-2025-01-06
73
  wandb_watch: null
74
  warmup_ratio: 0.1
75
  weight_decay: 0.0
@@ -79,12 +79,12 @@ xformers_attention: null
79
 
80
  </details><br>
81
 
82
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/fatcat87-taopanda/subnet56/runs/p0rc3cvq)
83
- # test-task-2025-01-06
84
 
85
  This model is a fine-tuned version of [mhenrichsen/gemma-7b](https://huggingface.co/mhenrichsen/gemma-7b) on the None dataset.
86
  It achieves the following results on the evaluation set:
87
- - Loss: 1.0913
88
 
89
  ## Model description
90
 
@@ -107,31 +107,24 @@ The following hyperparameters were used during training:
107
  - train_batch_size: 2
108
  - eval_batch_size: 2
109
  - seed: 42
110
- - gradient_accumulation_steps: 3
111
- - total_train_batch_size: 6
112
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
113
  - lr_scheduler_type: cosine
114
- - lr_scheduler_warmup_steps: 5
115
- - num_epochs: 4
116
 
117
  ### Training results
118
 
119
  | Training Loss | Epoch | Step | Validation Loss |
120
  |:-------------:|:-----:|:----:|:---------------:|
121
- | 1.046 | 0.075 | 1 | 1.1912 |
122
- | 1.1095 | 0.3 | 4 | 1.1067 |
123
- | 1.0619 | 0.6 | 8 | 1.0441 |
124
- | 1.0547 | 0.9 | 12 | 1.0446 |
125
- | 0.931 | 1.15 | 16 | 1.0528 |
126
- | 0.8836 | 1.45 | 20 | 1.0399 |
127
- | 0.8958 | 1.75 | 24 | 1.0419 |
128
- | 0.9922 | 2.05 | 28 | 1.0361 |
129
- | 0.7736 | 2.3 | 32 | 1.0851 |
130
- | 0.7437 | 2.6 | 36 | 1.0840 |
131
- | 0.7552 | 2.9 | 40 | 1.0769 |
132
- | 0.6623 | 3.15 | 44 | 1.0870 |
133
- | 0.7173 | 3.45 | 48 | 1.0946 |
134
- | 0.7122 | 3.75 | 52 | 1.0913 |
135
 
136
 
137
  ### Framework versions
 
5
  - generated_from_trainer
6
  base_model: mhenrichsen/gemma-7b
7
  model-index:
8
+ - name: test-task-2025-01-06-16-53-36
9
  results: []
10
  ---
11
 
 
17
 
18
  axolotl version: `0.4.1`
19
  ```yaml
20
+ adapter: lora
21
  base_model: mhenrichsen/gemma-7b
22
  bf16: auto
23
  datasets:
 
34
  fp16: null
35
  fsdp: null
36
  fsdp_config: null
37
+ gradient_accumulation_steps: 4
38
  gradient_checkpointing: true
39
  group_by_length: false
40
+ hub_model_id: FatCat87/test-task-2025-01-06-16-53-36
41
  learning_rate: 0.0002
42
+ load_in_4bit: false
43
+ load_in_8bit: true
44
  local_rank: null
45
  logging_steps: 1
46
  lora_alpha: 16
 
50
  lr_scheduler: cosine
51
  micro_batch_size: 2
52
  model_type: AutoModelForCausalLM
53
+ num_epochs: 2
54
  optimizer: adamw_bnb_8bit
55
  output_dir: ./outputs/out
56
  pad_to_sequence_len: true
 
67
  wandb_entity: fatcat87-taopanda
68
  wandb_log_model: null
69
  wandb_mode: online
70
+ wandb_name: test-task-2025-01-06-16-53-36
71
  wandb_project: subnet56
72
+ wandb_runid: test-task-2025-01-06-16-53-36
73
  wandb_watch: null
74
  warmup_ratio: 0.1
75
  weight_decay: 0.0
 
79
 
80
  </details><br>
81
 
82
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/fatcat87-taopanda/subnet56/runs/ydehe9sz)
83
+ # test-task-2025-01-06-16-53-36
84
 
85
  This model is a fine-tuned version of [mhenrichsen/gemma-7b](https://huggingface.co/mhenrichsen/gemma-7b) on the None dataset.
86
  It achieves the following results on the evaluation set:
87
+ - Loss: 1.0005
88
 
89
  ## Model description
90
 
 
107
  - train_batch_size: 2
108
  - eval_batch_size: 2
109
  - seed: 42
110
+ - gradient_accumulation_steps: 4
111
+ - total_train_batch_size: 8
112
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
113
  - lr_scheduler_type: cosine
114
+ - lr_scheduler_warmup_steps: 2
115
+ - num_epochs: 2
116
 
117
  ### Training results
118
 
119
  | Training Loss | Epoch | Step | Validation Loss |
120
  |:-------------:|:-----:|:----:|:---------------:|
121
+ | 0.9785 | 0.1 | 1 | 1.1005 |
122
+ | 1.0282 | 0.3 | 3 | 1.0752 |
123
+ | 1.0195 | 0.6 | 6 | 1.0116 |
124
+ | 1.0354 | 0.9 | 9 | 1.0007 |
125
+ | 0.9228 | 1.15 | 12 | 0.9984 |
126
+ | 0.8895 | 1.45 | 15 | 1.0030 |
127
+ | 0.9105 | 1.75 | 18 | 1.0005 |
 
 
 
 
 
 
 
128
 
129
 
130
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c05c1b8dba8f8c380d8615f6e36c46a7b81f5153ea5ac58052c3feac0713a74e
3
- size 200157610
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddc47c4c67627c9f8599020a7fac10a59f24e904523e5c545785d347f0adcb44
3
+ size 400173482