error577
/

f5007068-224b-49a6-8aff-b438e87bf3ef

PEFT

Safetensors

llama

axolotl

Generated from Trainer

Model card Files Files and versions Community

error577 commited on 30 days ago

Commit

ca933ab

verified ·

1 Parent(s): 7d59c35

End of training

Browse files

Files changed (2) hide show

README.md +23 -23
adapter_model.bin +1 -1

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ flash_attention: true
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 8
 gradient_checkpointing: false
 max_grad_norm: 1
 group_by_length: false
@@ -55,11 +55,11 @@ hub_model_id: error577/f5007068-224b-49a6-8aff-b438e87bf3ef
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
-learning_rate: 0.00001
 load_in_4bit: false
 load_in_8bit: false
 local_rank: null
-logging_steps: 1
 lora_alpha: 16
 lora_dropout: 0.05
 lora_fan_in_fan_out: null
@@ -106,7 +106,7 @@ xformers_attention: null
 This model is a fine-tuned version of [princeton-nlp/Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.1622
 ## Model description
@@ -125,12 +125,12 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
@@ -140,22 +140,22 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 1.8795        | 0.0017 | 1    | 2.4177          |
-| 2.5273        | 0.0543 | 32   | 2.3960          |
-| 1.8541        | 0.1086 | 64   | 2.3317          |
-| 3.499         | 0.1630 | 96   | 2.2805          |
-| 2.2621        | 0.2173 | 128  | 2.2462          |
-| 2.0988        | 0.2716 | 160  | 2.2218          |
-| 1.7164        | 0.3259 | 192  | 2.2037          |
-| 2.3432        | 0.3802 | 224  | 2.1891          |
-| 2.4035        | 0.4345 | 256  | 2.1792          |
-| 2.6015        | 0.4889 | 288  | 2.1726          |
-| 3.2448        | 0.5432 | 320  | 2.1681          |
-| 2.1421        | 0.5975 | 352  | 2.1657          |
-| 1.9935        | 0.6518 | 384  | 2.1643          |
-| 2.3433        | 0.7061 | 416  | 2.1633          |
-| 1.7778        | 0.7604 | 448  | 2.1628          |
-| 2.2321        | 0.8148 | 480  | 2.1622          |
 ### Framework versions

 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 32
 gradient_checkpointing: false
 max_grad_norm: 1
 group_by_length: false
 hub_repo: null
 hub_strategy: checkpoint
 hub_token: null
+learning_rate: 0.0001
 load_in_4bit: false
 load_in_8bit: false
 local_rank: null
+logging_steps: 10
 lora_alpha: 16
 lora_dropout: 0.05
 lora_fan_in_fan_out: null
 This model is a fine-tuned version of [princeton-nlp/Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.0399
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
+- gradient_accumulation_steps: 32
+- total_train_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0068 | 1    | 2.4177          |
+| 2.2864        | 0.2173 | 32   | 2.1603          |
+| 2.0811        | 0.4345 | 64   | 2.1125          |
+| 2.1192        | 0.6518 | 96   | 2.0913          |
+| 2.1414        | 0.8691 | 128  | 2.0728          |
+| 1.9297        | 1.0864 | 160  | 2.0627          |
+| 1.9738        | 1.3036 | 192  | 2.0563          |
+| 1.907         | 1.5209 | 224  | 2.0506          |
+| 2.0121        | 1.7382 | 256  | 2.0443          |
+| 1.8795        | 1.9554 | 288  | 2.0390          |
+| 1.9241        | 2.1727 | 320  | 2.0422          |
+| 1.776         | 2.3900 | 352  | 2.0423          |
+| 1.8113        | 2.6073 | 384  | 2.0412          |
+| 1.7836        | 2.8245 | 416  | 2.0393          |
+| 1.8139        | 3.0418 | 448  | 2.0393          |
+| 1.7138        | 3.2591 | 480  | 2.0399          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6d7d6a63424053e98e958b4690a69db8f8cda8b1f559fe65ea6ed7fd7ec8bdf
 size 30103498

 version https://git-lfs.github.com/spec/v1
+oid sha256:6daeb1631c58b0317cd1eeb6ef0e7504cd076bc1652bdbf8919849e9773c07fb
 size 30103498