DorinSht
/

recreate_llama_68M_vanilla

@@ -3,8 +3,6 @@ license: apache-2.0
 base_model: JackFram/llama-68m
 tags:
 - generated_from_trainer
-metrics:
-- accuracy
 model-index:
 - name: recreate_llama_68M_vanilla
   results: []
@@ -15,10 +13,7 @@ should probably proofread and complete it, then remove this comment. -->
 # recreate_llama_68M_vanilla
-This model is a fine-tuned version of [JackFram/llama-68m](https://huggingface.co/JackFram/llama-68m) on the anon8231489123/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json dataset.
-It achieves the following results on the evaluation set:
-- Loss: 9.5494
-- Accuracy: 0.3512
 ## Model description
@@ -37,27 +32,17 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.005
-- train_batch_size: 32
-- eval_batch_size: 16
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 3.0
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Accuracy |
-|:-------------:|:------:|:----:|:---------------:|:--------:|
-| No log        | 0.3125 | 10   | 7.9370          | 0.3676   |
-| No log        | 0.625  | 20   | 8.6808          | 0.3478   |
-| No log        | 0.9375 | 30   | 10.9798         | 0.1029   |
-| No log        | 1.25   | 40   | 10.3023         | 0.2493   |
-| No log        | 1.5625 | 50   | 9.7688          | 0.3501   |
-| No log        | 1.875  | 60   | 9.6190          | 0.3510   |
-| No log        | 2.1875 | 70   | 9.5617          | 0.3510   |
-| No log        | 2.5    | 80   | 9.5470          | 0.3511   |
-| No log        | 2.8125 | 90   | 9.5487          | 0.3511   |
 ### Framework versions

 base_model: JackFram/llama-68m
 tags:
 - generated_from_trainer
 model-index:
 - name: recreate_llama_68M_vanilla
   results: []
 # recreate_llama_68M_vanilla
+This model is a fine-tuned version of [JackFram/llama-68m](https://huggingface.co/JackFram/llama-68m) on an unknown dataset.
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 12
+- eval_batch_size: 24
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 3.0
 ### Training results
 ### Framework versions

events.out.tfevents.1716992315.isl-gpu27.3581638.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3a3ff99ea9c22da16589178c9795aac80195817c2259b7a173bb03508929dfc
+size 5413

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99e0aa1f57b9d3b412748068c5f9dd3e0251c942e88329be1c63e7a93fe20583
 size 272123144

 version https://git-lfs.github.com/spec/v1
+oid sha256:b29f07972e09b7e99ddb0e5808e0d88a17501a11c554772a8caa6699722d71ee
 size 272123144

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a7e736d0a0ca3bfb4692ce9f2e011bee553b4b200689ef7c6910f21f466d39b9
-size 5112

 version https://git-lfs.github.com/spec/v1
+oid sha256:b997f16f1f5bfcc99bd4bc2fc9d1821e67341b4093297f45f67ca5d81966f1ba
+size 5240