parameters guide
samplers guide
model generation
role play settings
quant selection
arm quants
iq quants vs q quants
optimal model setting
gibberish fixes
coherence
instructing following
quality generation
chat settings
quality settings
llamacpp server
llamacpp
lmstudio
sillytavern
koboldcpp
backyard
ollama
model generation steering
steering
model generation fixes
text generation webui
ggufs
exl2
full precision
quants
imatrix
neo imatrix
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,9 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
<h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
|
@@ -117,7 +121,9 @@ IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to
|
|
117 |
PRIMARY PARAMETERS:
|
118 |
------------------------------------------------------------------------------
|
119 |
|
120 |
-
--temp N
|
|
|
|
|
121 |
|
122 |
Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
|
123 |
|
@@ -127,19 +133,25 @@ Too much temp can affect instruction following in some cases and sometimes not e
|
|
127 |
|
128 |
Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
|
129 |
|
130 |
-
--top-p N
|
|
|
|
|
131 |
|
132 |
If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
|
133 |
|
134 |
I use default of: .95 ;
|
135 |
|
136 |
-
--min-p N
|
|
|
|
|
137 |
|
138 |
Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
|
139 |
|
140 |
I use default: .05 ;
|
141 |
|
142 |
-
--top-k N
|
|
|
|
|
143 |
|
144 |
Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
|
145 |
|
@@ -417,5 +429,4 @@ Smaller quants may require STRONGER settings (all classes of models) due to comp
|
|
417 |
|
418 |
This is also influenced by the parameter size of the model in relation to the quant size.
|
419 |
|
420 |
-
IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
|
421 |
-
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- parameters guide
|
5 |
+
- samplers guide
|
6 |
+
- model generation
|
7 |
---
|
8 |
|
9 |
<h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
|
|
|
121 |
PRIMARY PARAMETERS:
|
122 |
------------------------------------------------------------------------------
|
123 |
|
124 |
+
--temp N
|
125 |
+
|
126 |
+
temperature (default: 0.8)
|
127 |
|
128 |
Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
|
129 |
|
|
|
133 |
|
134 |
Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
|
135 |
|
136 |
+
--top-p N
|
137 |
+
|
138 |
+
top-p sampling (default: 0.9, 1.0 = disabled)
|
139 |
|
140 |
If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
|
141 |
|
142 |
I use default of: .95 ;
|
143 |
|
144 |
+
--min-p N
|
145 |
+
|
146 |
+
min-p sampling (default: 0.1, 0.0 = disabled)
|
147 |
|
148 |
Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
|
149 |
|
150 |
I use default: .05 ;
|
151 |
|
152 |
+
--top-k N
|
153 |
+
|
154 |
+
top-k sampling (default: 40, 0 = disabled)
|
155 |
|
156 |
Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
|
157 |
|
|
|
429 |
|
430 |
This is also influenced by the parameter size of the model in relation to the quant size.
|
431 |
|
432 |
+
IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
|
|