FatCat87
/

test-task-2025-01-06-16-53-36

@@ -5,7 +5,7 @@ tags:
 - generated_from_trainer
 base_model: mhenrichsen/gemma-7b
 model-index:
-- name: test-task-2025-01-06
   results: []
 ---
@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-adapter: qlora
 base_model: mhenrichsen/gemma-7b
 bf16: auto
 datasets:
@@ -34,13 +34,13 @@ flash_attention: true
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 3
 gradient_checkpointing: true
 group_by_length: false
-hub_model_id: FatCat87/test-task-2025-01-06
 learning_rate: 0.0002
-load_in_4bit: true
-load_in_8bit: false
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
@@ -50,7 +50,7 @@ lora_target_linear: true
 lr_scheduler: cosine
 micro_batch_size: 2
 model_type: AutoModelForCausalLM
-num_epochs: 4
 optimizer: adamw_bnb_8bit
 output_dir: ./outputs/out
 pad_to_sequence_len: true
@@ -67,9 +67,9 @@ val_set_size: 0.1
 wandb_entity: fatcat87-taopanda
 wandb_log_model: null
 wandb_mode: online
-wandb_name: test-task-2025-01-06
 wandb_project: subnet56
-wandb_runid: test-task-2025-01-06
 wandb_watch: null
 warmup_ratio: 0.1
 weight_decay: 0.0
@@ -79,12 +79,12 @@ xformers_attention: null
 </details><br>
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/fatcat87-taopanda/subnet56/runs/p0rc3cvq)
-# test-task-2025-01-06
 This model is a fine-tuned version of [mhenrichsen/gemma-7b](https://huggingface.co/mhenrichsen/gemma-7b) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0913
 ## Model description
@@ -107,31 +107,24 @@ The following hyperparameters were used during training:
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
-- gradient_accumulation_steps: 3
-- total_train_batch_size: 6
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 5
-- num_epochs: 4
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.046         | 0.075 | 1    | 1.1912          |
-| 1.1095        | 0.3   | 4    | 1.1067          |
-| 1.0619        | 0.6   | 8    | 1.0441          |
-| 1.0547        | 0.9   | 12   | 1.0446          |
-| 0.931         | 1.15  | 16   | 1.0528          |
-| 0.8836        | 1.45  | 20   | 1.0399          |
-| 0.8958        | 1.75  | 24   | 1.0419          |
-| 0.9922        | 2.05  | 28   | 1.0361          |
-| 0.7736        | 2.3   | 32   | 1.0851          |
-| 0.7437        | 2.6   | 36   | 1.0840          |
-| 0.7552        | 2.9   | 40   | 1.0769          |
-| 0.6623        | 3.15  | 44   | 1.0870          |
-| 0.7173        | 3.45  | 48   | 1.0946          |
-| 0.7122        | 3.75  | 52   | 1.0913          |
 ### Framework versions

 - generated_from_trainer
 base_model: mhenrichsen/gemma-7b
 model-index:
+- name: test-task-2025-01-06-16-53-36
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+adapter: lora
 base_model: mhenrichsen/gemma-7b
 bf16: auto
 datasets:
 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 4
 gradient_checkpointing: true
 group_by_length: false
+hub_model_id: FatCat87/test-task-2025-01-06-16-53-36
 learning_rate: 0.0002
+load_in_4bit: false
+load_in_8bit: true
 local_rank: null
 logging_steps: 1
 lora_alpha: 16
 lr_scheduler: cosine
 micro_batch_size: 2
 model_type: AutoModelForCausalLM
+num_epochs: 2
 optimizer: adamw_bnb_8bit
 output_dir: ./outputs/out
 pad_to_sequence_len: true
 wandb_entity: fatcat87-taopanda
 wandb_log_model: null
 wandb_mode: online
+wandb_name: test-task-2025-01-06-16-53-36
 wandb_project: subnet56
+wandb_runid: test-task-2025-01-06-16-53-36
 wandb_watch: null
 warmup_ratio: 0.1
 weight_decay: 0.0
 </details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/fatcat87-taopanda/subnet56/runs/ydehe9sz)
+# test-task-2025-01-06-16-53-36
 This model is a fine-tuned version of [mhenrichsen/gemma-7b](https://huggingface.co/mhenrichsen/gemma-7b) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.0005
 ## Model description
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 2
+- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.9785        | 0.1   | 1    | 1.1005          |
+| 1.0282        | 0.3   | 3    | 1.0752          |
+| 1.0195        | 0.6   | 6    | 1.0116          |
+| 1.0354        | 0.9   | 9    | 1.0007          |
+| 0.9228        | 1.15  | 12   | 0.9984          |
+| 0.8895        | 1.45  | 15   | 1.0030          |
+| 0.9105        | 1.75  | 18   | 1.0005          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c05c1b8dba8f8c380d8615f6e36c46a7b81f5153ea5ac58052c3feac0713a74e
-size 200157610

 version https://git-lfs.github.com/spec/v1
+oid sha256:ddc47c4c67627c9f8599020a7fac10a59f24e904523e5c545785d347f0adcb44
+size 400173482