DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -16,6 +16,15 @@ These settings / suggestions can be applied to all models including GGUF, EXL2,
 It also includes critical settings for Class 3 and Class 4 models at this repo - DavidAU - to enhance and control generation
 for specific as a well as outside use case(s) including role play, chat and other use case(s).
 Even if you are not using my models, you may find this document useful for any model available online.
 If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
@@ -30,14 +39,13 @@ PARAMETERS AND SAMPLERS
 Primary Testing Parameters I use, including use for output generation examples at my repo:
-Ranged:
 temperature: 0 to 5 ("temp")
 repetition_penalty : 1.02 to 1.15 ("rep pen")
-Set:
 top_k:40
@@ -47,7 +55,15 @@ top_p: 0.95
 repeat-last-n: 64   (also called: "repetition_penalty_range" / "rp range" )
-(no other settings, parameter or samplers activated when generating examples)
 Below are all the LLAMA_CPP parameters and samplers.
@@ -56,6 +72,7 @@ I have added notes below each one for adjustment / enhancement(s) for specific u
 Following this section will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui .
 The "llamacpp_HF" only requires the GGUF you want to use plus a few config files from "source repo" of the model.
 (this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
 This allows access to very advanced samplers in addition to all the parameters / samplers here.
@@ -78,6 +95,7 @@ https://github.com/ggerganov/llama.cpp
 (scroll down on the main page for more apps/programs to use GGUFs too)
 CRITICAL NOTES:
@@ -98,6 +116,7 @@ The goal here is to use parameters to raise/lower the power of the model and sam
 With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
 QUANTS:
@@ -121,7 +140,12 @@ IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to
 PRIMARY PARAMETERS:
 ------------------------------------------------------------------------------
---temp N
 temperature (default: 0.8)
@@ -133,7 +157,7 @@ Too much temp can affect instruction following in some cases and sometimes not e
 Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
---top-p N
 top-p sampling (default: 0.9, 1.0 = disabled)
@@ -141,7 +165,7 @@ If not set to 1, select tokens with probabilities adding up to less than this nu
 I use default of: .95 ;
---min-p N
 min-p sampling (default: 0.1, 0.0 = disabled)
@@ -149,7 +173,7 @@ Tokens with probability smaller than (min_p) * (probability of the most likely t
 I use default: .05 ;
---top-k N
 top-k sampling (default: 40, 0 = disabled)
@@ -157,10 +181,7 @@ Similar to top_p, but select instead only the top_k most likely tokens. Higher v
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
-These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
-Keep in mind the biggest parameter / random "unknown" is your prompt. A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the
-output, even at min temp settings. CAPS also affect generation too.
 For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time. Then adjust a word, phrase, sentence etc - to see the differences.
 Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
@@ -178,7 +199,11 @@ Then test "at temp" to see the MODELS in action. (5-10 generations recommended)
 PENALITY SAMPLERS:
 ------------------------------------------------------------------------------
---repeat-last-n N
 last n tokens to consider for penalize (default: 64, 0 = disabled, -1	= ctx_size)
 ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
@@ -187,8 +212,11 @@ THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, senten
 This setting also works in conjunction with all other "rep pens" below.
---repeat-penalty N
 penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
 (commonly called "rep pen")
@@ -198,28 +226,32 @@ Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01..
 This affects creativity of the model over all , not just how words are penalized.
---presence-penalty N
 repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
 Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
-CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
-CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
---frequency-penalty N
 repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
 Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
-CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
-CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
---penalize-nl                           	penalize newline tokens (default: false)
 Generally this is not used.
@@ -228,7 +260,7 @@ SECONDARY SAMPLERS / FILTERS:
 ------------------------------------------------------------------------------
---tfs N
 tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
@@ -236,23 +268,23 @@ Tries to detect a tail of low-probability tokens in the distribution and removes
 ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
---typical N
 locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
 If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
---mirostat N
 use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
                                         		(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
---mirostat-lr N
 Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
---mirostat-ent N
 Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
@@ -273,11 +305,11 @@ For Class 3 models it is suggested to use this to assist with generation (min se
 For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
---dynatemp-range N
 dynamic temperature range (default: 0.0, 0.0 = disabled)
---dynatemp-exp N
 dynamic temperature exponent (default: 1.0)
@@ -302,13 +334,13 @@ To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
---xtc-probability N
 xtc probability (default: 0.0, 0.0 = disabled)
 Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
---xtc-threshold N
 xtc threshold (default: 0.1, 1.0 = disabled)
@@ -319,7 +351,7 @@ Suggest you experiment with this one, with other advanced samplers disabled to s
--l,    --logit-bias TOKEN_ID(+/-)BIAS
 modifies the likelihood of token appearing in the completion,
                                         		i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
@@ -341,19 +373,19 @@ OTHER:
 ------------------------------------------------------------------------------
--s,    --seed SEED
 RNG seed (default: -1, use random seed for -1)
---samplers SAMPLERS
 samplers that will be used for generation in the order, separated by ';' (default: top_k;tfs_z;typ_p;top_p;min_p;xtc;temperature)
---sampling-seq SEQUENCE
 simplified sequence for samplers that will be used (default: kfypmxt)
---ignore-eos
 ignore end of stream token and continue generating (implies --logit-bias EOS-inf)
@@ -383,7 +415,7 @@ For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model
 You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
-DRY:
 Class 3:
@@ -402,7 +434,8 @@ dry_allowed_length: 2 (or less)
 dry_base: 1.15 to 1.5
-QUADRATIC SAMPLING:
 Class 3:
@@ -416,6 +449,9 @@ smoothing_factor: 3 to 5 (or higher)
 smoothing_curve: 1.5 to 2.
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 for operation of CLASS 4 models for chat / role play and/or "smoother operation".
@@ -429,4 +465,4 @@ Smaller quants may require STRONGER settings (all classes of models) due to comp
 This is also influenced by the parameter size of the model in relation to the quant size.
-IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.

 It also includes critical settings for Class 3 and Class 4 models at this repo - DavidAU - to enhance and control generation
 for specific as a well as outside use case(s) including role play, chat and other use case(s).
+This settings can also fix a number of model issues such as:
+- "Gibberish"
+- letter, word, phrase, paragraph repeats
+- coherence
+- creativeness or lack there of or .. too much - purple prose.
+Likewise setting can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
 Even if you are not using my models, you may find this document useful for any model available online.
 If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
 Primary Testing Parameters I use, including use for output generation examples at my repo:
+<B>Ranged Parameters:</B>
 temperature: 0 to 5 ("temp")
 repetition_penalty : 1.02 to 1.15 ("rep pen")
+<B>Set parameters:</B>
 top_k:40
 repeat-last-n: 64   (also called: "repetition_penalty_range" / "rp range" )
+I do not set any other settings, parameters or have samplers activated when generating examples.
+Everything else is "zeroed" / "disabled".
+These parameters/settings are considered both safe and default and in most cases available to all users in all apps.
+---
+<B>Llama CPP Parameters, Samplers and Advanced Samplers</B>
 Below are all the LLAMA_CPP parameters and samplers.
 Following this section will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui .
 The "llamacpp_HF" only requires the GGUF you want to use plus a few config files from "source repo" of the model.
 (this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
 This allows access to very advanced samplers in addition to all the parameters / samplers here.
 (scroll down on the main page for more apps/programs to use GGUFs too)
+---
 CRITICAL NOTES:
 With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
+---
 QUANTS:
 PRIMARY PARAMETERS:
 ------------------------------------------------------------------------------
+These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
+Keep in mind the biggest parameter / random "unknown" is your prompt. A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the
+output, even at min temp settings. CAPS also affect generation too.
+<B>temp  /  temperature</B>
 temperature (default: 0.8)
 Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
+<B>top-p</B>
 top-p sampling (default: 0.9, 1.0 = disabled)
 I use default of: .95 ;
+<B>min-p</B>
 min-p sampling (default: 0.1, 0.0 = disabled)
 I use default: .05 ;
+<B>top-k</B>
 top-k sampling (default: 40, 0 = disabled)
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
+NOTES:
 For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time. Then adjust a word, phrase, sentence etc - to see the differences.
 Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
 PENALITY SAMPLERS:
 ------------------------------------------------------------------------------
+These samplers "trim" or "prune" output.
+PRIMARY:
+<B>repeat-last-n</B>
 last n tokens to consider for penalize (default: 64, 0 = disabled, -1	= ctx_size)
 ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
 This setting also works in conjunction with all other "rep pens" below.
+This parameter is the "RANGE" of tokens looked at for the samplers directly below.
+SECONDARIES:
+<B>repeat-penalty</B>
 penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
 (commonly called "rep pen")
 This affects creativity of the model over all , not just how words are penalized.
+<B>presence-penalty</B>
 repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
 Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
+CLASS 3: 0.05 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
+CLASS 4: 0.1 to 0.25 may assist generation BUT SET "repeat-last-n" to 64
+<B>frequency-penalty</B>
 repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
 Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
+CLASS 3: 0.25 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
+CLASS 4: 0.7 to 0.8 may assist generation BUT SET "repeat-last-n" to 64.
+<B>penalize-nl  </B>
+penalize newline tokens (default: false)
 Generally this is not used.
 ------------------------------------------------------------------------------
+<B>tfs</B>
 tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
 ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
+<B>typical</B>
 locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
 If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
+<B>mirostat</B>
 use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
                                         		(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
+<B>mirostat-lr</B>
 Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
+<B>mirostat-ent</B>
 Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
 For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
+<B>dynatemp-range</B>
 dynamic temperature range (default: 0.0, 0.0 = disabled)
+<B>dynatemp-exp</B>
 dynamic temperature exponent (default: 1.0)
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
+<B>xtc-probability</B>
 xtc probability (default: 0.0, 0.0 = disabled)
 Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
+<B>xtc-threshold</B>
 xtc threshold (default: 0.1, 1.0 = disabled)
+<B>l,    logit-bias TOKEN_ID(+/-)BIAS   </B>
 modifies the likelihood of token appearing in the completion,
                                         		i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
 ------------------------------------------------------------------------------
+<B>-s,    --seed SEED     </B>
 RNG seed (default: -1, use random seed for -1)
+<B>samplers SAMPLERS             </B>
 samplers that will be used for generation in the order, separated by ';' (default: top_k;tfs_z;typ_p;top_p;min_p;xtc;temperature)
+<B>sampling-seq SEQUENCE          </B>
 simplified sequence for samplers that will be used (default: kfypmxt)
+<B>ignore-eos                    </B>
 ignore end of stream token and continue generating (implies --logit-bias EOS-inf)
 You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
+<B>DRY:</B>
 Class 3:
 dry_base: 1.15 to 1.5
+<B>QUADRATIC SAMPLING:</B>
 Class 3:
 smoothing_curve: 1.5 to 2.
+IMPORTANT:
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 for operation of CLASS 4 models for chat / role play and/or "smoother operation".
 This is also influenced by the parameter size of the model in relation to the quant size.
+IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.