DavidAU commited on
Commit
3f1e840
·
verified ·
1 Parent(s): ac491dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -17,7 +17,7 @@ tags:
17
  <h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
18
 
19
  This document includes detailed information, references, and notes for general parameters, samplers and
20
- advanced samplers to get the most out of your model's abilities.
21
 
22
  These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
23
 
@@ -133,11 +133,28 @@ You can use almost all parameters, samplers and advanced samplers using "KOBOLDC
133
 
134
  Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
135
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  OTHER PROGRAMS:
137
 
138
  Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
139
 
140
- In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
141
 
142
  You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
143
 
@@ -173,6 +190,8 @@ General Parameters => https://arxiv.org/html/2408.13586v1
173
 
174
  Benchmarking-and-Guiding-Adaptive-Sampling-Decoding https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
175
 
 
 
176
  ---
177
 
178
  CRITICAL NOTES:
@@ -233,13 +252,16 @@ Generally it is recommended to run the highest quant(s) you can on your machine
233
 
234
  The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
235
 
 
 
236
  IMATRIX:
237
 
238
  Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
239
 
240
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
241
 
242
- <B>Recommended Quants:</B>
 
243
 
244
  This covers both Imatrix and regular quants.
245
 
@@ -389,6 +411,12 @@ Please see sections below this for advanced usage, more details, settings notes
389
 
390
  </small>
391
 
 
 
 
 
 
 
392
  ---
393
 
394
  HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)
 
17
  <h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
18
 
19
  This document includes detailed information, references, and notes for general parameters, samplers and
20
+ advanced samplers to get the most out of your model's abilities including notes / settings for the most popular AI/LLM app in use.
21
 
22
  These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
23
 
 
133
 
134
  Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
135
 
136
+ SILLYTAVERN:
137
+
138
+ Note that https://github.com/SillyTavern/SillyTavern also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
139
+
140
+ You can use almost all parameters, samplers and advanced samplers using "SILLYTAVERN" without the need to get the source config files (the "llamacpp_HF" step).
141
+
142
+ For CLASS3 and CLASS4 the most important setting is "SMOOTHING FACTOR" (Quadratic Smoothing) ; information is located on this page:
143
+
144
+ https://docs.sillytavern.app/usage/common-settings/
145
+
146
+ NOTE: It appears that Silly Tavern also supports "DRY" and "XTC" too ; but it is not yet in the documentation at the time of writing.
147
+
148
+ You may also want to check out how to connect SillyTavern to local AI "apps" running on your pc here:
149
+
150
+ https://docs.sillytavern.app/usage/api-connections/
151
+
152
+
153
  OTHER PROGRAMS:
154
 
155
  Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
156
 
157
+ In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama", "backyard", and "lmstudio" (as well as other apps too).
158
 
159
  You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
160
 
 
190
 
191
  Benchmarking-and-Guiding-Adaptive-Sampling-Decoding https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
192
 
193
+ Depending on the AI/LLM "apps" you are using, additional reference material for parameters / samplers may also exist.
194
+
195
  ---
196
 
197
  CRITICAL NOTES:
 
252
 
253
  The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
254
 
255
+ There is an exception to this , see "Neo Imatrix" below.
256
+
257
  IMATRIX:
258
 
259
  Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
260
 
261
  IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
262
 
263
+
264
+ <B>Recommended Quants - ALL:</B>
265
 
266
  This covers both Imatrix and regular quants.
267
 
 
411
 
412
  </small>
413
 
414
+ Special note:
415
+
416
+ It appears "DRY" / "XTC" samplers has been added to LLAMACPP.
417
+
418
+ It is available via "llama-server.exe". Likely this sampler will also become available "downstream" in applications that use LLAMACPP in due time.
419
+
420
  ---
421
 
422
  HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)