SmolLM-360M-Instruct_fsdp_qlora_nf4_adapter

Browse files

Files changed (5) hide show

README.md +14 -24
adapter_config.json +4 -4
adapter_model.safetensors +1 -1
runs/Sep05_16-00-30_algo-2/events.out.tfevents.1725552045.algo-2.67.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [HuggingFaceTB/SmolLM-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-360M-Instruct) on the generator dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.8076
 ## Model description
@@ -51,32 +51,22 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 20
 ### Training results
-| Training Loss | Epoch   | Step | Validation Loss |
-|:-------------:|:-------:|:----:|:---------------:|
-| 2.2926        | 0.9524  | 10   | 2.1440          |
-| 2.1035        | 2.0     | 21   | 2.0304          |
-| 2.0163        | 2.9524  | 31   | 1.9668          |
-| 1.9537        | 4.0     | 42   | 1.9199          |
-| 1.9107        | 4.9524  | 52   | 1.8908          |
-| 1.8883        | 6.0     | 63   | 1.8671          |
-| 1.8554        | 6.9524  | 73   | 1.8512          |
-| 1.8492        | 8.0     | 84   | 1.8385          |
-| 1.8229        | 8.9524  | 94   | 1.8296          |
-| 1.8198        | 10.0    | 105  | 1.8223          |
-| 1.8074        | 10.9524 | 115  | 1.8172          |
-| 1.7958        | 12.0    | 126  | 1.8130          |
-| 1.7958        | 12.9524 | 136  | 1.8105          |
-| 1.792         | 14.0    | 147  | 1.8088          |
-| 1.7843        | 14.9524 | 157  | 1.8079          |
-| 1.7873        | 16.0    | 168  | 1.8077          |
-| 1.7848        | 16.9524 | 178  | 1.8076          |
-| 1.7836        | 18.0    | 189  | 1.8075          |
-| 1.7828        | 18.9524 | 199  | 1.8075          |
-| 1.7827        | 19.0476 | 200  | 1.8076          |
 ### Framework versions

 This model is a fine-tuned version of [HuggingFaceTB/SmolLM-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-360M-Instruct) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.8760
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 10
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 2.2721        | 0.9756 | 10   | 2.1262          |
+| 2.0927        | 1.9512 | 20   | 2.0278          |
+| 2.0071        | 2.9268 | 30   | 1.9690          |
+| 1.9512        | 4.0    | 41   | 1.9282          |
+| 1.9247        | 4.9756 | 51   | 1.9045          |
+| 1.9024        | 5.9512 | 61   | 1.8897          |
+| 1.88          | 6.9268 | 71   | 1.8809          |
+| 1.8788        | 8.0    | 82   | 1.8767          |
+| 1.8763        | 8.9756 | 92   | 1.8760          |
+| 1.8735        | 9.7561 | 100  | 1.8760          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "gate_proj",
     "v_proj",
-    "up_proj",
-    "k_proj",
     "o_proj",
-    "down_proj",
-    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "down_proj",
     "gate_proj",
+    "q_proj",
     "v_proj",
     "o_proj",
+    "up_proj",
+    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a74c253a9300c52ed13afb1303a21193b3a26b98795ed94d11ccf5314ac9d653
 size 17426248

 version https://git-lfs.github.com/spec/v1
+oid sha256:eb6ff95bec0ba4f501f9be4f015052c2a19df510752c26d3b8aa3e1549c4eb93
 size 17426248

runs/Sep05_16-00-30_algo-2/events.out.tfevents.1725552045.algo-2.67.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a5f95ff282bf5ce772aacb681b12e813397d4eac348bc6030a8efd7b3d408f6
+size 10485

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a5f46807250b5c695501899087945ede7c1e6a0a14e8498bf1b43eb8568a0de2
 size 5240

 version https://git-lfs.github.com/spec/v1
+oid sha256:3c32b70cee7bf7d12100258a03ad31e8b4c5d465fff43ba84de0d43a727954a4
 size 5240