couragestrong/tiai-pinch-v2-lora

Files changed (4) hide show

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0318
 ## Model description
@@ -35,32 +35,32 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 3e-05
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: constant_with_warmup
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 2
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 3.7136        | 0.2031 | 13   | 3.0882          |
-| 0.8226        | 0.4062 | 26   | 0.7761          |
-| 0.0114        | 0.6094 | 39   | 0.0543          |
-| 0.007         | 0.8125 | 52   | 0.0289          |
-| 0.0009        | 1.0156 | 65   | 0.0236          |
-| 0.152         | 1.2188 | 78   | 0.0162          |
-| 0.0003        | 1.4219 | 91   | 0.0318          |
 ### Framework versions
 - PEFT 0.14.0
-- Transformers 4.48.0.dev0
 - Pytorch 2.1.1+cu121
 - Datasets 3.2.0
 - Tokenizers 0.21.0

 This model is a fine-tuned version of [mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0260
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-05
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 2
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: constant_with_warmup
+- lr_scheduler_warmup_ratio: 0.15
+- num_epochs: 5
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.2117        | 1.0   | 32   | 0.8225          |
+| 0.0004        | 2.0   | 64   | 0.0426          |
+| 0.0035        | 3.0   | 96   | 0.0165          |
+| 0.0001        | 4.0   | 128  | 0.0416          |
+| 0.0059        | 5.0   | 160  | 0.0260          |
 ### Framework versions
 - PEFT 0.14.0
+- Transformers 4.49.0.dev0
 - Pytorch 2.1.1+cu121
 - Datasets 3.2.0
 - Tokenizers 0.21.0

adapter_config.json CHANGED Viewed

@@ -23,10 +23,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "q_proj",
-    "v_proj",
     "o_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "o_proj",
+    "k_proj",
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d9f278cf83ed25f2a401a9929e3ab4167710ca618796d546c46f1b90d6e9ed22
 size 45689808

 version https://git-lfs.github.com/spec/v1
+oid sha256:cab4049e2e8abae9288c090740d27ac56dd2296070ddbc54e0a1b8e4245dc177
 size 45689808

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f6487f113685336d6515d036828083e06e4303db3f575a5d3c67984d9bc25fb6
 size 5560

 version https://git-lfs.github.com/spec/v1
+oid sha256:ad16bfc093ffc21881648fa17073140e1b9c13d0547e0163f0cbc851c46e2825
 size 5560