Heejindo
/

rationale_model_e10_save5000

@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.0836
 ## Model description
@@ -38,8 +38,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
@@ -47,28 +47,18 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step   | Validation Loss |
-|:-------------:|:------:|:------:|:---------------:|
-| 1.0612        | 0.4769 | 5000   | 2.0836          |
-| 0.3248        | 0.9538 | 10000  | 2.7452          |
-| 0.1833        | 1.4308 | 15000  | 2.9985          |
-| 0.1503        | 1.9077 | 20000  | 3.1807          |
-| 0.1285        | 2.3846 | 25000  | 3.3188          |
-| 0.1204        | 2.8615 | 30000  | 3.4080          |
-| 0.1049        | 3.3384 | 35000  | 3.5038          |
-| 0.102         | 3.8153 | 40000  | 3.5999          |
-| 0.0865        | 4.2923 | 45000  | 3.7076          |
-| 0.0844        | 4.7692 | 50000  | 3.7838          |
-| 0.0664        | 5.2461 | 55000  | 3.9114          |
-| 0.0671        | 5.7230 | 60000  | 4.0266          |
-| 0.051         | 6.1999 | 65000  | 4.1414          |
-| 0.0515        | 6.6768 | 70000  | 4.2532          |
-| 0.0422        | 7.1538 | 75000  | 4.3879          |
-| 0.0409        | 7.6307 | 80000  | 4.4931          |
-| 0.0352        | 8.1076 | 85000  | 4.6297          |
-| 0.0356        | 8.5845 | 90000  | 4.7444          |
-| 0.0322        | 9.0614 | 95000  | 4.8618          |
-| 0.0328        | 9.5383 | 100000 | 4.9179          |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.6975
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 1e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 0.3986        | 0.9538 | 5000  | 2.6975          |
+| 0.1468        | 1.9077 | 10000 | 3.2221          |
+| 0.1156        | 2.8615 | 15000 | 3.4922          |
+| 0.0981        | 3.8153 | 20000 | 3.6490          |
+| 0.0847        | 4.7692 | 25000 | 3.8345          |
+| 0.0704        | 5.7230 | 30000 | 3.9968          |
+| 0.0551        | 6.6768 | 35000 | 4.2504          |
+| 0.0433        | 7.6307 | 40000 | 4.5271          |
+| 0.0354        | 8.5845 | 45000 | 4.7534          |
+| 0.0317        | 9.5383 | 50000 | 4.9696          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:db60b6ff3e9ba57f952db8a25071b16b580a952c56dae2960635b25db012ef6c
 size 4943274328

 version https://git-lfs.github.com/spec/v1
+oid sha256:e1ef7f2d1334d8ffb94dd219cd72b507f7388478a04e9730bb1ca1c813996c1e
 size 4943274328