DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -112,6 +112,7 @@ All information about parameters, samplers and advanced samplers applies to ALL
 QUANTS:
 -	QUANTS Detailed information.
 -	IMATRIX Quants
 -	ADDITIONAL QUANT INFORMATION
 -	ARM QUANTS / Q4_0_X_X
 -	NEO Imatrix Quants / Neo Imatrix X Quants
@@ -123,7 +124,7 @@ SOURCE FILES for my Models / APPS to Run LLMs / AIs:
 -	TEXT-GENERATION-WEBUI
 -	KOBOLDCPP
 -	SILLYTAVERN
--	OTHER PROGRAMS
 TESTING / Default / Generation Example PARAMETERS AND SAMPLERS
 -	Basic settings suggested for general model operation.
@@ -212,6 +213,35 @@ The Imatrix process has NO effect on Q8 or F16 quants.
 F16 is full precision, just in GGUF format.
 ADDITONAL QUANT INFORMATION:
 <details>
@@ -371,7 +401,7 @@ For reference here are some Class 3/4 models:
 [ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]
-(note Grand Guttenberg Madness/Darkess (12B) are class 1 models, but compressed versions of 23.5B)
 Although Class 3 and Class 4 models will work when used within their specific use case(s), standard parameters and settings on the model card, I recognize that users want either a smoother experience
 and/or want to use these models for other than intended use case(s) and that is in part why I created this document.
@@ -533,13 +563,11 @@ Note for Class 3/Class 4 models settings/samplers (discussed below) "repeat-last
 Now that you have the basic parameters and samplers from the previous section, I will cover Generational Control and Steering.
-This section is optional.
-This section (in part) will cover how to deal with Class 3/4 issues directly, as well as general issues than can happen with any "class" of model during generation IF you want to control them manually as
 the "Quick Reference" and/or "Detailed Parameters, Samplers, and Advanced Samplers" will cover how to deal with any generation issue(s) automatically.
-This section will also cover how to manually STEER generation(s) - ANY MODEL, ANY TYPE.
 There is a very important concept that must be covered first:
 The output/generation/answer to your prompt/instructions BECOMES part of your "prompt" after you click STOP, and then click on "CONTINUE".
@@ -552,7 +580,7 @@ When you hit "REGEN" this nullifies only the last "generation" - not the prompt
 The part I will cover here is once a generation has started, from a single prompt (no other prompts/generations in the chat).
-So lets start with a prompt (NOTE: this one has no steering in the instructions):
 Start a 1000 word scene (vivid horror, 1st person, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...
@@ -590,6 +618,7 @@ These methods apply to all generation types - not just a "scene" or "story", but
 Notes:
 - For Text Generation Webui, you can transfer your "chat" to "notebook" for easy Stop/Edit/Continue function.
 - For Silly Tavern -> This is built in.
 - For LMStudio -> This is built in.
 - For API (direct control) you have to send the "chat" elements back to the "server" with the "edits" (send the whole "revised" chat as a json payload).
@@ -616,7 +645,19 @@ If you have single or multiple paragraph repeat(s):
 - Hit continue.
 - Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph)
-In each case we are BREAKING the "condition" that lead (or lead into) to the repeat(s).
 <B>Advanced Steering / Fixing Issues (any model, any type) and "sequenced" parameter/sampler change(s)</B>
@@ -671,16 +712,18 @@ You may want to modify the instructions to provide a "steering" continue point a
 ---
-Compiled by: "EnragedAntelope"
-https://huggingface.co/EnragedAntelope
-https://github.com/EnragedAntelope
 This section will get you started - especially with class 3 and 4 models - and the detail section will cover settings / control in more depth below.
 Please see sections below this for advanced usage, more details, settings, notes etc etc.
 <small>
 # LLM Parameters Reference Table
@@ -736,7 +779,7 @@ Please see sections below this for advanced usage, more details, settings, notes
 | **Advanced Samplers** |
-| dry_multiplier 		| Controls DRY (Don't Repeat Yourself) intensity. Range: 0.8-1.12+ |
 | dry_allowed_length | Allowed length for repeated sequences in DRY. Default: 2 |
@@ -749,14 +792,22 @@ Please see sections below this for advanced usage, more details, settings, notes
 ## Notes
-- For Class 3 and 4 models, using both DRY and Quadratic sampling is recommended
 - Lower quants (Q2K, IQ1s, IQ2s) may require stronger settings due to compression damage
 - Parameters interact with each other, so test changes one at a time
 - Always test with temperature at 0 first to establish a baseline
 </small>
-IMPORTANT: Make sure to review MIROSTAT sampler settings below, due to behaviour of this specific sampler / affect on parameters/other samplers.
 ---
@@ -935,7 +986,11 @@ Generally this is not used.
 In some AI/LLM apps, these may only be available via JSON file modification and/or API.
-For "text-gen-webui" and "Koboldcpp" these are directly accessible (and via Sillytavern IF you use either of these APPS to connect Silly Tavern to their API).
 <B>i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):</B>
@@ -1083,7 +1138,7 @@ Careful testing is required, as this can have unclear side effects.
 Note #1 :
-You can use these samplers via Sillytavern IF you use either of these APPS (Koboldcpp/Text Generation Webui) to connect Silly Tavern to their API.
 Other Notes:
@@ -1112,8 +1167,8 @@ However, you should also check / test operation of (these are in Text Generation
 a] Affects per token generation:
 - top_a
-- epsilon_cutoff - see note 4
-- eta_cutoff - see note 4
 - no_repeat_ngram_size - see note #1.
 b] Affects generation including phrase, sentence, paragraph and entire generation:
@@ -1227,6 +1282,8 @@ Hopefully this powerful sampler will soon appear in all LLM/AI apps.
 You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
 This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
 This is a game changer in custom real time control of the model.

 QUANTS:
 -	QUANTS Detailed information.
 -	IMATRIX Quants
+-	QUANTS GENERATIONAL DIFFERENCES:
 -	ADDITIONAL QUANT INFORMATION
 -	ARM QUANTS / Q4_0_X_X
 -	NEO Imatrix Quants / Neo Imatrix X Quants
 -	TEXT-GENERATION-WEBUI
 -	KOBOLDCPP
 -	SILLYTAVERN
+-	Lmstudio, Ollma, Llamacpp, and OTHER PROGRAMS
 TESTING / Default / Generation Example PARAMETERS AND SAMPLERS
 -	Basic settings suggested for general model operation.
 F16 is full precision, just in GGUF format.
+QUANTS GENERATIONAL DIFFERENCES:
+Higher quants will have more detail, nuance and in some cases stronger "emotional" levels. Characters will also be
+more "fleshed out" too. Sense of "there" will also increase.
+Likewise for any use case -> higher quants nuance (both instruction following AND output generation) will be higher.
+"Nuance" is critical for both understanding, as well as the quality of the output generation.
+To put this another way, "nuance" is lost as the full precision model is more and more compressed (lower and lower quants).
+Some of this can be counteracted by parameters and/or Imatrix (as noted earlier).
+IQ4XS / IQ4NL quants:
+Due to the unusual nature of this quant (mixture/processing), generations from it will be different then other quants.
+These quants can also be "quanted" with or without an Imatrix.
+You may want to try it / compare it to other quant(s) output.
+Special note on Q2k/Q3 quants:
+You may need to use temp 2 or lower with these quants (1 or lower for q2k). Just too much compression at this level, damaging the model.
+IQ quants (and Imatrix versions of q2k/q3) perform better at these "BPW" levels.
+Rep pen adjustments may also be required to get the most out a model at this/these quant level(s).
 ADDITONAL QUANT INFORMATION:
 <details>
 [ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]
+(note Grand Guttenberg Madness/Darkness (12B) are class 1 models, but compressed versions of 23.5B)
 Although Class 3 and Class 4 models will work when used within their specific use case(s), standard parameters and settings on the model card, I recognize that users want either a smoother experience
 and/or want to use these models for other than intended use case(s) and that is in part why I created this document.
 Now that you have the basic parameters and samplers from the previous section, I will cover Generational Control and Steering.
+This section is optional and covers how to manually STEER generation(s) - ANY MODEL, ANY TYPE.
+This section (in part) will also cover how to deal with Class 3/4 model issues directly, as well as general issues than can happen with any "class" of model during generation IF you want to control them manually as
 the "Quick Reference" and/or "Detailed Parameters, Samplers, and Advanced Samplers" will cover how to deal with any generation issue(s) automatically.
 There is a very important concept that must be covered first:
 The output/generation/answer to your prompt/instructions BECOMES part of your "prompt" after you click STOP, and then click on "CONTINUE".
 The part I will cover here is once a generation has started, from a single prompt (no other prompts/generations in the chat).
+So lets start with a prompt (NOTE: this prompt has no "steering" in the instructions):
 Start a 1000 word scene (vivid horror, 1st person, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...
 Notes:
 - For Text Generation Webui, you can transfer your "chat" to "notebook" for easy Stop/Edit/Continue function.
+- For KoboldCPP -> This is built in.
 - For Silly Tavern -> This is built in.
 - For LMStudio -> This is built in.
 - For API (direct control) you have to send the "chat" elements back to the "server" with the "edits" (send the whole "revised" chat as a json payload).
 - Hit continue.
 - Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph)
+In each case we are BREAKING the "condition(s)" that lead (or lead into) to the repeat(s).
+If you have "rants" and/or "model has lost its mind":
+- Stop generation, edit out all the paragraph(s), and back AS FAR as possible to where is appears the rant/mind loss occured (delete ALL) and delete one additional paragraph / 2 or more sentences.
+- Hit continue.
+- Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph).
+Class 3/4 model additional note:
+With these classes of model, you MAY need to "edit" / "revise" further back than one or two lines / one paragraph - they sometimes need just a little more editing.
+Another option is using "Cold" Editing/Generation explained below.
 <B>Advanced Steering / Fixing Issues (any model, any type) and "sequenced" parameter/sampler change(s)</B>
 ---
+Compiled by: "EnragedAntelope" ( https://huggingface.co/EnragedAntelope  || https://github.com/EnragedAntelope )
 This section will get you started - especially with class 3 and 4 models - and the detail section will cover settings / control in more depth below.
 Please see sections below this for advanced usage, more details, settings, notes etc etc.
+IMPORTANT NOTES:
+Not all parameters, samplers and advanced samplers are listed in this quick reference section. Scroll down to see all of them in following sections.
+Likewise there may be some "name variation(s)" - in other LLM/AI apps - this is addressed in the detailed sections.
 <small>
 # LLM Parameters Reference Table
 | **Advanced Samplers** |
+| dry_multiplier 		| Controls DRY (Don't Repeat Yourself) intensity. Range: 0.8-1.12+ Class 3 (Class 4 is higher) |
 | dry_allowed_length | Allowed length for repeated sequences in DRY. Default: 2 |
 ## Notes
+- For Class 3 and 4 models, using both DRY and Quadratic sampling is recommended (see advanced/detailed samplers below on how to control the model here directly)
 - Lower quants (Q2K, IQ1s, IQ2s) may require stronger settings due to compression damage
 - Parameters interact with each other, so test changes one at a time
 - Always test with temperature at 0 first to establish a baseline
 </small>
+CLASS 3/4 Models:
+If you are using a class 3 or class 4 model for use case(s) such as role play, multi-turn, chat etc etc, it is suggested to activate / set all samplers for class 3 but may be required for class 4 models.
+Likewise for fine control of a class 3/4 via "DRY" and "Quadratic" samplers is detailed below. These allow you to dial up or dial down the model's raw power directly.
+MICROSTAT Sampler - IMPORTANT:
+Make sure to review MIROSTAT sampler settings below, due to behaviour of this specific sampler / affect on parameters/other samplers which varies from app to app too.
 ---
 In some AI/LLM apps, these may only be available via JSON file modification and/or API.
+For "text-gen-webui", "Koboldcpp"  these are directly accessible ; other programs/app this varies.
+Sillytavern:
+If the apps support (Sillytavern is connected to via API) these parameters/samplers then you can access them via Silly Tavern's parameter/sampler panel. So if you are using Text-Gen-Webui, Koboldcpp, LMStudio, Llamacpp, Ollama (etc) you can set/change/access all or most of these.
 <B>i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):</B>
 Note #1 :
+You can use these samplers via Sillytavern IF you use either of these APPS (Koboldcpp/Text Generation Webui/App supports them) to connect Silly Tavern to their API.
 Other Notes:
 a] Affects per token generation:
 - top_a
+- epsilon_cutoff - see note #4
+- eta_cutoff - see note #4
 - no_repeat_ngram_size - see note #1.
 b] Affects generation including phrase, sentence, paragraph and entire generation:
 You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
+You can also access in SillyTavern if you use KoboldCPP as your "API" connected app too.
 This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
 This is a game changer in custom real time control of the model.