DavidAU
/

Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

@@ -27,13 +27,20 @@ PARAMETERS AND SAMPLERS
 Primary Testing Parameters I use, including use for output generation examples at my repo:
 Ranged:
 temperature: 0 to 5 ("temp")
 repetition_penalty : 1.02 to 1.15 ("rep pen")
 Set:
 top_k:40
 min_p:0.05
 top_p: 0.95
 repeat-last-n: 64   (also called: "repetition_penalty_range" / "rp range" )
 (no other settings, parameter or samplers activated when generating examples)
@@ -111,21 +118,31 @@ PRIMARY PARAMETERS:
 ------------------------------------------------------------------------------
 --temp N                               		temperature (default: 0.8)
 Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
 Range 0 to 5. Increment at .1 per change.
 Too much temp can affect instruction following in some cases and sometimes not enough = boring generation.
 Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
 --top-p N                               		top-p sampling (default: 0.9, 1.0 = disabled)
 If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
 I use default of: .95 ;
 --min-p N                               	min-p sampling (default: 0.1, 0.0 = disabled)
 Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
 I use default: .05 ;
 --top-k N                               		top-k sampling (default: 40, 0 = disabled)
 Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
 These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
@@ -153,6 +170,7 @@ PENALITY SAMPLERS:
 ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
 THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
 This setting also works in conjunction with all other "rep pens" below.
@@ -160,6 +178,7 @@ This setting also works in conjunction with all other "rep pens" below.
 (commonly called "rep pen")
 Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
 This affects creativity of the model over all , not just how words are penalized.
@@ -168,6 +187,7 @@ This affects creativity of the model over all , not just how words are penalized
 Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
 CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
@@ -176,6 +196,7 @@ CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
 Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
 CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
 --penalize-nl                           	penalize newline tokens (default: false)
@@ -188,28 +209,38 @@ SECONDARY SAMPLERS / FILTERS:
 --tfs N                                 		tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
 Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
 ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
 --typical N                             	locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
 If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
 --mirostat N                            	use Mirostat sampling.
                                         		"Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
                                         		(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
 --mirostat-lr N                         	Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
 --mirostat-ent N                       	Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
 Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
 mirostat_tau: 5-8 is a good value.
 mirostat_eta: 0.1 is a good value.
 This is the big one ; activating this will help with creative generation. It can also help with stability.
 This is both a sampler (and pruner) and enhancement all in one.
 For Class 3 models it is suggested to use this to assist with generation (min settings).
 For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
@@ -217,25 +248,32 @@ For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr
 --dynatemp-exp N                       	dynamic temperature exponent (default: 1.0)
 In: oobabooga/text-generation-webui (has on/off, and high / low) :
 Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent".
 This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
 For Kobold a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
 Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
 To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
 1 - Set the "temp" to 1.3 (the regular temp parameter)
 2 - Set the "range" to .500 (this gives you ".8" to "1.8" with "1.3" as the "base")
 3 - Set exp to 1 (or as you want).
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
 --xtc-probability N                  	xtc probability (default: 0.0, 0.0 = disabled)
 Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
 --xtc-threshold N                       	xtc threshold (default: 0.1, 1.0 = disabled)
 If 2 or more tokens have probability above this threshold, consider removing all but the last one.
 XTC is a new sampler, that adds an interesting twist in generation.
@@ -252,6 +290,7 @@ This may or may not be available. This requires a bit more work.
 IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
 This is a very powerful pruning method; which can drastically alter output generation.
 I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "word" / part word)  then use this.
 Careful testing is required, as this can have unclear side effects.
@@ -277,6 +316,7 @@ ADVANCED SAMPLERS:
 ------------------------------------------------------------------------------
 I am not going to touch on all of them ; just the main ones ; for more info see:
 https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
 Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
@@ -284,36 +324,49 @@ Keep in mind these parameters/samplers become available (for GGUFs) in "oobaboog
 What I will touch on here are special settings for CLASS 3 and CLASS 4 models.
 For CLASS 3 you can use one, two or both.
 For CLASS 4 using BOTH are strongly recommended, or at minimum "QUADRATIC SAMPLING".
 These samplers (along with "penalty" settings) work in conjunction to "wrangle" the model / control it and get it to settle down, important for Class 3 but critical for Class 4 models.
 For other classes of models, these advanced samplers can enhance operation across the board.
 For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model inline rather than "over prune it".
 You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
 DRY:
 Class 3:
 dry_multiplier: .8
 dry_allowed_length: 2
 dry_base: 1
 Class 4:
 dry_multiplier: .8 to 1.12+
 dry_allowed_length: 2 (or less)
 dry_base: 1.15 to 1.5
 QUADRATIC SAMPLING:
 Class 3:
 smoothing_factor: 1 to 3
 smoothing_curve: 1
 Class 4:
 smoothing_factor: 3 to 5 (or higher)
 smoothing_curve: 1.5 to 2.
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
@@ -326,6 +379,7 @@ If you use Microstat, keep in mind this will interact with these two advanced sa
 Finally:
 Smaller quants may require STRONGER settings (all classes of models) due to compression damage, especially for Q2K, and IQ1/IQ2s.
 This is also influenced by the parameter size of the model in relation to the quant size.
 IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.

 Primary Testing Parameters I use, including use for output generation examples at my repo:
 Ranged:
 temperature: 0 to 5 ("temp")
 repetition_penalty : 1.02 to 1.15 ("rep pen")
 Set:
 top_k:40
 min_p:0.05
 top_p: 0.95
 repeat-last-n: 64   (also called: "repetition_penalty_range" / "rp range" )
 (no other settings, parameter or samplers activated when generating examples)
 ------------------------------------------------------------------------------
 --temp N                               		temperature (default: 0.8)
 Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
 Range 0 to 5. Increment at .1 per change.
 Too much temp can affect instruction following in some cases and sometimes not enough = boring generation.
 Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
 --top-p N                               		top-p sampling (default: 0.9, 1.0 = disabled)
 If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
 I use default of: .95 ;
 --min-p N                               	min-p sampling (default: 0.1, 0.0 = disabled)
 Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
 I use default: .05 ;
 --top-k N                               		top-k sampling (default: 40, 0 = disabled)
 Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
 Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
 These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
 ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
 THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
 This setting also works in conjunction with all other "rep pens" below.
 (commonly called "rep pen")
 Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
 This affects creativity of the model over all , not just how words are penalized.
 Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
 CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
 Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
 CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
 --penalize-nl                           	penalize newline tokens (default: false)
 --tfs N                                 		tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
 Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
 ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
 --typical N                             	locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
 If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
 --mirostat N                            	use Mirostat sampling.
                                         		"Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
                                         		(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
 --mirostat-lr N                         	Mirostat learning rate, parameter eta (default: 0.1)  " mirostat_tau "
 --mirostat-ent N                       	Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
 Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
 mirostat_tau: 5-8 is a good value.
 mirostat_eta: 0.1 is a good value.
 This is the big one ; activating this will help with creative generation. It can also help with stability.
 This is both a sampler (and pruner) and enhancement all in one.
 For Class 3 models it is suggested to use this to assist with generation (min settings).
 For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
 --dynatemp-exp N                       	dynamic temperature exponent (default: 1.0)
 In: oobabooga/text-generation-webui (has on/off, and high / low) :
 Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent".
 This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
 For Kobold a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
 Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
 To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
 1 - Set the "temp" to 1.3 (the regular temp parameter)
 2 - Set the "range" to .500 (this gives you ".8" to "1.8" with "1.3" as the "base")
 3 - Set exp to 1 (or as you want).
 This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
 --xtc-probability N                  	xtc probability (default: 0.0, 0.0 = disabled)
 Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
 --xtc-threshold N                       	xtc threshold (default: 0.1, 1.0 = disabled)
 If 2 or more tokens have probability above this threshold, consider removing all but the last one.
 XTC is a new sampler, that adds an interesting twist in generation.
 IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
 This is a very powerful pruning method; which can drastically alter output generation.
 I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "word" / part word)  then use this.
 Careful testing is required, as this can have unclear side effects.
 ------------------------------------------------------------------------------
 I am not going to touch on all of them ; just the main ones ; for more info see:
 https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
 Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
 What I will touch on here are special settings for CLASS 3 and CLASS 4 models.
 For CLASS 3 you can use one, two or both.
 For CLASS 4 using BOTH are strongly recommended, or at minimum "QUADRATIC SAMPLING".
 These samplers (along with "penalty" settings) work in conjunction to "wrangle" the model / control it and get it to settle down, important for Class 3 but critical for Class 4 models.
 For other classes of models, these advanced samplers can enhance operation across the board.
 For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model inline rather than "over prune it".
 You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
 DRY:
 Class 3:
 dry_multiplier: .8
 dry_allowed_length: 2
 dry_base: 1
 Class 4:
 dry_multiplier: .8 to 1.12+
 dry_allowed_length: 2 (or less)
 dry_base: 1.15 to 1.5
 QUADRATIC SAMPLING:
 Class 3:
 smoothing_factor: 1 to 3
 smoothing_curve: 1
 Class 4:
 smoothing_factor: 3 to 5 (or higher)
 smoothing_curve: 1.5 to 2.
 Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 Finally:
 Smaller quants may require STRONGER settings (all classes of models) due to compression damage, especially for Q2K, and IQ1/IQ2s.
 This is also influenced by the parameter size of the model in relation to the quant size.
 IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.