parameters guide
samplers guide
model generation
role play settings
quant selection
arm quants
iq quants vs q quants
optimal model setting
gibberish fixes
coherence
instructing following
quality generation
chat settings
quality settings
llamacpp server
llamacpp
lmstudio
sillytavern
koboldcpp
backyard
ollama
model generation steering
steering
model generation fixes
text generation webui
ggufs
exl2
full precision
quants
imatrix
neo imatrix
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ tags:
|
|
17 |
<h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
|
18 |
|
19 |
This document includes detailed information, references, and notes for general parameters, samplers and
|
20 |
-
advanced samplers to get the most out of your model's abilities including notes / settings for the most popular AI/LLM app in use.
|
21 |
|
22 |
These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
|
23 |
|
@@ -40,13 +40,13 @@ The settings discussed in this document can also fix a number of model issues (<
|
|
40 |
|
41 |
Likewise ALL the setting (parameters, samplers and advanced samplers) below can also improve model generation and/or general overall "smoothness" / "quality" of model operation:
|
42 |
|
43 |
-
- all parameters and samplers available via LLAMACPP (and most apps that run / use LLAMACPP)
|
44 |
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in oobabooga/text-generation-webui including llamacpp_HF loader (allowing a lot more samplers)
|
45 |
-
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in KoboldCPP (including Anti-slop filters)
|
46 |
|
47 |
-
Even if you are not using my models, you may find this document useful for any model (any quant / full source) available online
|
48 |
|
49 |
-
If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
|
50 |
|
51 |
This document will be updated over time too and is subject to change without notice.
|
52 |
|
@@ -90,7 +90,7 @@ I do not set any other settings, parameters or have samplers activated when gene
|
|
90 |
|
91 |
Everything else is "zeroed" / "disabled".
|
92 |
|
93 |
-
These parameters/settings are considered both safe and default and in most cases available to all users in all apps.
|
94 |
|
95 |
---
|
96 |
|
@@ -106,7 +106,7 @@ You will need the config files to use "llamacpp_HF" loader ("text-generation-web
|
|
106 |
|
107 |
You can also use the full source in "text-generation-webui" too.
|
108 |
|
109 |
-
As an alternative you can use GGUFs directly in "KOBOLDCPP" without the "config files" and still use almost all the parameters, samplers and advanced samplers.
|
110 |
|
111 |
<B>Parameters, Samplers and Advanced Samplers</B>
|
112 |
|
@@ -143,7 +143,9 @@ For CLASS3 and CLASS4 the most important setting is "SMOOTHING FACTOR" (Quadrati
|
|
143 |
|
144 |
https://docs.sillytavern.app/usage/common-settings/
|
145 |
|
146 |
-
NOTE:
|
|
|
|
|
147 |
|
148 |
You may also want to check out how to connect SillyTavern to local AI "apps" running on your pc here:
|
149 |
|
@@ -154,7 +156,7 @@ OTHER PROGRAMS:
|
|
154 |
|
155 |
Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
|
156 |
|
157 |
-
In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama", "
|
158 |
|
159 |
You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
|
160 |
|
@@ -162,6 +164,12 @@ https://github.com/ggerganov/llama.cpp
|
|
162 |
|
163 |
(scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
|
164 |
|
|
|
|
|
|
|
|
|
|
|
|
|
165 |
---
|
166 |
|
167 |
DETAILED NOTES ON PARAMETERS, SAMPLERS and ADVANCED SAMPLERS:
|
@@ -176,11 +184,10 @@ https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-
|
|
176 |
|
177 |
Additional Links (on parameters, samplers and advanced samplers):
|
178 |
|
179 |
-
DRY
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
DRY => https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
|
184 |
|
185 |
Samplers : https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
|
186 |
|
@@ -267,7 +274,7 @@ This covers both Imatrix and regular quants.
|
|
267 |
|
268 |
Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
|
269 |
|
270 |
-
This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" with the most:
|
271 |
|
272 |
<small>
|
273 |
<PRE>
|
@@ -316,11 +323,11 @@ Here are some Imatrix Neo Models:
|
|
316 |
|
317 |
Suggestions for Imatrix NEO quants:
|
318 |
|
319 |
-
- The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the
|
320 |
-
- Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum
|
321 |
- Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
|
322 |
- Effects diminish quickly from Q5s and up.
|
323 |
-
- Q8 there is no change (as the Imatrix process does not affect this quant), and therefore
|
324 |
|
325 |
---
|
326 |
|
@@ -411,12 +418,6 @@ Please see sections below this for advanced usage, more details, settings notes
|
|
411 |
|
412 |
</small>
|
413 |
|
414 |
-
Special note:
|
415 |
-
|
416 |
-
It appears "DRY" / "XTC" samplers has been added to LLAMACPP.
|
417 |
-
|
418 |
-
It is available via "llama-server.exe". Likely this sampler will also become available "downstream" in applications that use LLAMACPP in due time.
|
419 |
-
|
420 |
---
|
421 |
|
422 |
HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)
|
@@ -722,6 +723,8 @@ i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello', or `--log
|
|
722 |
|
723 |
This may or may not be available. This requires a bit more work.
|
724 |
|
|
|
|
|
725 |
IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
|
726 |
|
727 |
This is a very powerful pruning method; which can drastically alter output generation.
|
|
|
17 |
<h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
|
18 |
|
19 |
This document includes detailed information, references, and notes for general parameters, samplers and
|
20 |
+
advanced samplers to get the most out of your model's abilities including notes / settings for the most popular AI/LLM app in use (LLAMACPP, KoboldCPP, Text-Generation-WebUI, LMStudio, Sillytavern, Ollama and others).
|
21 |
|
22 |
These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
|
23 |
|
|
|
40 |
|
41 |
Likewise ALL the setting (parameters, samplers and advanced samplers) below can also improve model generation and/or general overall "smoothness" / "quality" of model operation:
|
42 |
|
43 |
+
- all parameters and samplers available via LLAMACPP (and most apps that run / use LLAMACPP - including Lmstudio, Ollama, Sillytavern and others.)
|
44 |
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in oobabooga/text-generation-webui including llamacpp_HF loader (allowing a lot more samplers)
|
45 |
+
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in SillyTavern / KoboldCPP (including Anti-slop filters)
|
46 |
|
47 |
+
Even if you are not using my models, you may find this document <u>useful for any model (any quant / full source / any repo) available online.</u>
|
48 |
|
49 |
+
If you are currently using model(s) - from my repo and/or others - that are difficult to "wrangle" then you can apply "Class 3" or "Class 4" settings to them.
|
50 |
|
51 |
This document will be updated over time too and is subject to change without notice.
|
52 |
|
|
|
90 |
|
91 |
Everything else is "zeroed" / "disabled".
|
92 |
|
93 |
+
These parameters/settings are considered both safe and default and in most cases available to all users in all AI/LLM apps.
|
94 |
|
95 |
---
|
96 |
|
|
|
106 |
|
107 |
You can also use the full source in "text-generation-webui" too.
|
108 |
|
109 |
+
As an alternative you can use GGUFs directly in "KOBOLDCPP" / "SillyTavern" without the "config files" and still use almost all the parameters, samplers and advanced samplers.
|
110 |
|
111 |
<B>Parameters, Samplers and Advanced Samplers</B>
|
112 |
|
|
|
143 |
|
144 |
https://docs.sillytavern.app/usage/common-settings/
|
145 |
|
146 |
+
NOTE:
|
147 |
+
|
148 |
+
It appears that Silly Tavern also supports "DRY" and "XTC" too ; but it is not yet in the documentation at the time of writing.
|
149 |
|
150 |
You may also want to check out how to connect SillyTavern to local AI "apps" running on your pc here:
|
151 |
|
|
|
156 |
|
157 |
Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
|
158 |
|
159 |
+
In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Sillytavern", "Olama", "Backyard", and "LMStudio" (as well as other apps too).
|
160 |
|
161 |
You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
|
162 |
|
|
|
164 |
|
165 |
(scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
|
166 |
|
167 |
+
Special note:
|
168 |
+
|
169 |
+
It appears "DRY" / "XTC" samplers has been added to LLAMACPP and SILLYTAVERN.
|
170 |
+
|
171 |
+
It is available via "llama-server.exe". Likely this sampler will also become available "downstream" in applications that use LLAMACPP in due time.
|
172 |
+
|
173 |
---
|
174 |
|
175 |
DETAILED NOTES ON PARAMETERS, SAMPLERS and ADVANCED SAMPLERS:
|
|
|
184 |
|
185 |
Additional Links (on parameters, samplers and advanced samplers):
|
186 |
|
187 |
+
DRY
|
188 |
+
- https://github.com/oobabooga/text-generation-webui/pull/5677
|
189 |
+
- https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
|
190 |
+
- https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
|
|
|
191 |
|
192 |
Samplers : https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
|
193 |
|
|
|
274 |
|
275 |
Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
|
276 |
|
277 |
+
This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
|
278 |
|
279 |
<small>
|
280 |
<PRE>
|
|
|
323 |
|
324 |
Suggestions for Imatrix NEO quants:
|
325 |
|
326 |
+
- The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the "tint" so to speak
|
327 |
+
- Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum effect with IQ4_XS the most balanced in terms of power and bits.
|
328 |
- Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
|
329 |
- Effects diminish quickly from Q5s and up.
|
330 |
+
- Q8/F16 there is no change (as the Imatrix process does not affect this quant), and therefore not included.
|
331 |
|
332 |
---
|
333 |
|
|
|
418 |
|
419 |
</small>
|
420 |
|
|
|
|
|
|
|
|
|
|
|
|
|
421 |
---
|
422 |
|
423 |
HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)
|
|
|
723 |
|
724 |
This may or may not be available. This requires a bit more work.
|
725 |
|
726 |
+
Note: +- range is 0 to 100.
|
727 |
+
|
728 |
IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
|
729 |
|
730 |
This is a very powerful pruning method; which can drastically alter output generation.
|