Spaces:
Sleeping
Sleeping
Filip
commited on
Commit
·
f52c079
1
Parent(s):
cac0b2a
update
Browse files
README.md
CHANGED
@@ -31,14 +31,21 @@ Quantization method: `float16`
|
|
31 |
|
32 |
### Hyperparameters
|
33 |
|
34 |
-
Both models used the same hyperparameters during training
|
35 |
-
|
|
|
|
|
36 |
`lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
|
37 |
-
`per_device_train_batch_size=2
|
38 |
-
|
39 |
-
`
|
40 |
-
|
41 |
-
`
|
|
|
|
|
|
|
|
|
|
|
42 |
`lr_scheduler_type="linear"`: We decrease the learning rate linearly.
|
43 |
|
44 |
These hyperparameters are [suggested as default](https://docs.unsloth.ai/tutorials/how-to-finetune-llama-3-and-export-to-ollama) when using Unsloth. However, to experiment with them we also tried to finetune a third model by changing the hyperparameters, keeping some of of the above but changing to:
|
@@ -102,7 +109,7 @@ Please evaluate the responses based on the selected criteria. For each criterion
|
|
102 |
|
103 |
### Results
|
104 |
|
105 |
-

|
106 |
**p1** : `temperature=0.5` and `min_p=0.05` during inference\
|
107 |
**p2**: `temperature=1.5` and `min_p=0.1` `during inference
|
108 |
|
|
|
31 |
|
32 |
### Hyperparameters
|
33 |
|
34 |
+
Both models used the same hyperparameters during training.
|
35 |
+
|
36 |
+
`lora_alpha=16`: Scaling factor for low-rank matrices' contribution. Higher increases influence, speeds up convergence, risks instability/overfitting. Lower gives small effect, but may require more training steps.
|
37 |
+
|
38 |
`lora_dropout=0`: Probability of zeroing out elements in low-rank matrices for regularization. Higher gives more regularization but may slow training and degrade performance.\
|
39 |
+
`per_device_train_batch_size=2`:
|
40 |
+
|
41 |
+
`gradient_accumulation_steps=4`: The number of steps to accumulate gradients before performing a backpropagation update. Higher accumulates gradients over multiple steps, increasing the batch size without requiring additional memory. Can improve training stability and convergence if you have a large model and limited hardware.
|
42 |
+
|
43 |
+
`learning_rate=2e-4`: Rate at which the model updates its parameters during training. Higher gives faster convergence but risks overshooting optimal parameters and instability. Lower requires more training steps but better performance.
|
44 |
+
|
45 |
+
`optim="adamw_8bit"`: Using the Adam optimizer, a gradient descent method with momentum.
|
46 |
+
|
47 |
+
`weight_decay=0.01`: Penalty to add to the weights during training to prevent overfitting. The value is proportional to the magnitude of the weights to the loss function.
|
48 |
+
|
49 |
`lr_scheduler_type="linear"`: We decrease the learning rate linearly.
|
50 |
|
51 |
These hyperparameters are [suggested as default](https://docs.unsloth.ai/tutorials/how-to-finetune-llama-3-and-export-to-ollama) when using Unsloth. However, to experiment with them we also tried to finetune a third model by changing the hyperparameters, keeping some of of the above but changing to:
|
|
|
109 |
|
110 |
### Results
|
111 |
|
112 |
+
\
|
113 |
**p1** : `temperature=0.5` and `min_p=0.05` during inference\
|
114 |
**p2**: `temperature=1.5` and `min_p=0.1` `during inference
|
115 |
|