--- library_name: transformers tags: [] --- # Model Card for Model ID llama3-8B supervised finetuning with llama-adapter 4bit quantization ## Model Details adapter_layers:30 adapter_len:10 gamma:0.85 batch_size_training:4 gradient_accumulation_steps:4 lr:0.0001 num_epochs:3 num_freeze_layers:1 optimizer:"AdamW" peft_method:"llama_adapter" trainable params: 1,228,830 || all params: 8,031,490,078 || trainable%: 0.0153 ### Model Description Average epoch time: 566s Train loss: 0.41620415449142456 Eval loss: 1.57061767578125 Max CUDA memory allocated was 14 GB Max CUDA memory reserved was 16 GB Peak active CUDA memory was 14 GB CPU Total Peak Memory consumed during the train (max): 4 GB