End of training

Browse files

Files changed (2) hide show

README.md +15 -23
runs/Aug09_10-22-52_fastgpuserv/events.out.tfevents.1723268246.fastgpuserv.502349.1 +3 -0

README.md CHANGED Viewed

@@ -1,24 +1,23 @@
 ---
-base_model: ahxt/LiteLlama-460M-1T
 library_name: peft
-license: mit
 tags:
 - trl
 - sft
 - generated_from_trainer
 model-index:
-- name: ColdLLamaLite
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# ColdLLamaLite
-This model is a fine-tuned version of [ahxt/LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.0471
 ## Model description
@@ -37,14 +36,14 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 32
 - eval_batch_size: 32
 - seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 256
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
 - num_epochs: 10
@@ -52,18 +51,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 4.1747        | 0.8   | 25   | 3.9257          |
-| 3.626         | 1.6   | 50   | 3.2474          |
-| 2.8441        | 2.4   | 75   | 2.4490          |
-| 2.3365        | 3.2   | 100  | 2.2482          |
-| 2.2153        | 4.0   | 125  | 2.1758          |
-| 2.1591        | 4.8   | 150  | 2.1316          |
-| 2.1214        | 5.6   | 175  | 2.1011          |
-| 2.0946        | 6.4   | 200  | 2.0781          |
-| 2.0818        | 7.2   | 225  | 2.0622          |
-| 2.0614        | 8.0   | 250  | 2.0528          |
-| 2.0571        | 8.8   | 275  | 2.0485          |
-| 2.0522        | 9.6   | 300  | 2.0471          |
 ### Framework versions

 ---
+base_model: ahxt/llama1_s_1.8B_experimental
 library_name: peft
 tags:
 - trl
 - sft
 - generated_from_trainer
 model-index:
+- name: Cold-Rec-LLama-2-7B
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Cold-Rec-LLama-2-7B
+This model is a fine-tuned version of [ahxt/llama1_s_1.8B_experimental](https://huggingface.co/ahxt/llama1_s_1.8B_experimental) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0777
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 16
 - eval_batch_size: 32
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.03
 - num_epochs: 10
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.2217        | 1.992 | 249  | 0.2119          |
+| 0.1599        | 3.984 | 498  | 0.1558          |
+| 0.1177        | 5.976 | 747  | 0.1157          |
+| 0.0842        | 7.968 | 996  | 0.0875          |
+| 0.075         | 9.96  | 1245 | 0.0776          |
 ### Framework versions

runs/Aug09_10-22-52_fastgpuserv/events.out.tfevents.1723268246.fastgpuserv.502349.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad26b338296efb151b9d75bb2d3adc053a8bd8b32894c14dd7fc8f83d4446db1
+size 359