Spaces:

forestav
/

llm-as-a-judge

Sleeping

App Files Files Community

Filip commited on Dec 10, 2024

Commit

09cc62e

1 Parent(s): 382a6f6

update readme

Browse files

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -32,13 +32,13 @@ Quantization method: `float16`
 ### Hyperparameters
 Both models used the same hyperparameters during training.\
-`lora_alpha=16`\
-`lora_dropout=0`\
-`per_device_train_batch_size=2`\
-`gradient_accumulation_steps=4`\
-`learning_rate=2e-4`\
 `optim="adamw_8bit"`\
-`weight_decay=0.01`\
 `lr_scheduler_type="linear"`
 These hyperparameters are [suggested as default](https://docs.unsloth.ai/tutorials/how-to-finetune-llama-3-and-export-to-ollama) when using Unsloth. However, to experiment with them we also tried to finetune a third model by changing the hyperparameters, keeping some of of the above but changing to:

 ### Hyperparameters
 Both models used the same hyperparameters during training.\
+`lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.\
+`lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
+`per_device_train_batch_size=2`:\
+`gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.\
+`learning_rate=2e-4`: Rate at which the model updates its parameters during training. Higher gives faster convergence but risks overshooting optimal parameters and instability. Lower requires more training steps but better performance.\
 `optim="adamw_8bit"`\
+`weight_decay=0.01`: Penalty to add to the weights during training to prevent overfitting. The value is proportional to the magnitude of the weights to the loss function.\
 `lr_scheduler_type="linear"`
 These hyperparameters are [suggested as default](https://docs.unsloth.ai/tutorials/how-to-finetune-llama-3-and-export-to-ollama) when using Unsloth. However, to experiment with them we also tried to finetune a third model by changing the hyperparameters, keeping some of of the above but changing to: