DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -17,7 +17,7 @@ tags:
 <h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
 This document includes detailed information, references, and notes for general parameters, samplers and
-advanced samplers to get the most out of your model's abilities.
 These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
@@ -133,11 +133,28 @@ You can use almost all parameters, samplers and advanced samplers using "KOBOLDC
 Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
 OTHER PROGRAMS:
 Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
-In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama" and "lmstudio" (as well as other apps too).
 You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
@@ -173,6 +190,8 @@ General Parameters => https://arxiv.org/html/2408.13586v1
 Benchmarking-and-Guiding-Adaptive-Sampling-Decoding https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
 ---
 CRITICAL NOTES:
@@ -233,13 +252,16 @@ Generally it is recommended to run the highest quant(s) you can on your machine
 The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
 IMATRIX:
 Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
-<B>Recommended Quants:</B>
 This covers both Imatrix and regular quants.
@@ -389,6 +411,12 @@ Please see sections below this for advanced usage, more details, settings notes
 </small>
 ---
 HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)

 <h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
 This document includes detailed information, references, and notes for general parameters, samplers and
+advanced samplers to get the most out of your model's abilities including notes / settings for the most popular AI/LLM app in use.
 These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
 Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
+SILLYTAVERN:
+Note that https://github.com/SillyTavern/SillyTavern also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
+You can use almost all parameters, samplers and advanced samplers using "SILLYTAVERN" without the need to get the source config files (the "llamacpp_HF" step).
+For CLASS3 and CLASS4 the most important setting is "SMOOTHING FACTOR" (Quadratic Smoothing) ; information is located on this page:
+https://docs.sillytavern.app/usage/common-settings/
+NOTE: It appears that Silly Tavern also supports "DRY" and "XTC" too ; but it is not yet in the documentation at the time of writing.
+You may also want to check out how to connect SillyTavern to local AI "apps" running on your pc here:
+https://docs.sillytavern.app/usage/api-connections/
 OTHER PROGRAMS:
 Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
+In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Olama", "backyard", and "lmstudio" (as well as other apps too).
 You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
 Benchmarking-and-Guiding-Adaptive-Sampling-Decoding https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
+Depending on the AI/LLM "apps" you are using, additional reference material for parameters / samplers may also exist.
 ---
 CRITICAL NOTES:
 The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
+There is an exception to this , see "Neo Imatrix" below.
 IMATRIX:
 Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
 IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
+<B>Recommended Quants - ALL:</B>
 This covers both Imatrix and regular quants.
 </small>
+Special note:
+It appears "DRY" / "XTC" samplers has been added to LLAMACPP.
+It is available via "llama-server.exe". Likely this sampler will also become available "downstream" in applications that use LLAMACPP in due time.
 ---
 HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)