bilkultheek
/

YaHaHamaraLlama

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

bilkultheek commited on Aug 1, 2024

Commit

cf6813b

·

verified ·

1 Parent(s): 797c227

End of training

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
-base_model: NousResearch/Llama-2-7b-hf
 library_name: peft
 tags:
 - trl
 - sft
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 # YaHaHamaraLlama
-This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the None dataset.
 ## Model description
@@ -36,15 +37,14 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
 - train_batch_size: 8
-- eval_batch_size: 6
 - seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 3
-- mixed_precision_training: Native AMP
 ### Training results
@@ -53,7 +53,7 @@ The following hyperparameters were used during training:
 ### Framework versions
 - PEFT 0.12.0
-- Transformers 4.42.4
 - Pytorch 2.3.1+cu121
-- Datasets 2.20.0
 - Tokenizers 0.19.1

 ---
+base_model: ahxt/LiteLlama-460M-1T
 library_name: peft
+license: mit
 tags:
 - trl
 - sft
 # YaHaHamaraLlama
+This model is a fine-tuned version of [ahxt/LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) on the None dataset.
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
 - train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 5
 ### Training results
 ### Framework versions
 - PEFT 0.12.0
+- Transformers 4.43.3
 - Pytorch 2.3.1+cu121
+- Datasets 2.17.0
 - Tokenizers 0.19.1