DavidAU commited on
Commit
bc03205
·
verified ·
1 Parent(s): beb2f82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -27,13 +27,20 @@ PARAMETERS AND SAMPLERS
27
  Primary Testing Parameters I use, including use for output generation examples at my repo:
28
 
29
  Ranged:
 
30
  temperature: 0 to 5 ("temp")
 
31
  repetition_penalty : 1.02 to 1.15 ("rep pen")
32
 
 
33
  Set:
 
34
  top_k:40
 
35
  min_p:0.05
 
36
  top_p: 0.95
 
37
  repeat-last-n: 64 (also called: "repetition_penalty_range" / "rp range" )
38
 
39
  (no other settings, parameter or samplers activated when generating examples)
@@ -111,21 +118,31 @@ PRIMARY PARAMETERS:
111
  ------------------------------------------------------------------------------
112
 
113
  --temp N temperature (default: 0.8)
 
114
  Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
 
115
  Range 0 to 5. Increment at .1 per change.
 
116
  Too much temp can affect instruction following in some cases and sometimes not enough = boring generation.
 
117
  Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
118
 
119
  --top-p N top-p sampling (default: 0.9, 1.0 = disabled)
 
120
  If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
 
121
  I use default of: .95 ;
122
 
123
  --min-p N min-p sampling (default: 0.1, 0.0 = disabled)
 
124
  Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
 
125
  I use default: .05 ;
126
 
127
  --top-k N top-k sampling (default: 40, 0 = disabled)
 
128
  Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
 
129
  Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
130
 
131
  These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
@@ -153,6 +170,7 @@ PENALITY SAMPLERS:
153
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
154
 
155
  THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
 
156
  This setting also works in conjunction with all other "rep pens" below.
157
 
158
 
@@ -160,6 +178,7 @@ This setting also works in conjunction with all other "rep pens" below.
160
  (commonly called "rep pen")
161
 
162
  Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
 
163
  This affects creativity of the model over all , not just how words are penalized.
164
 
165
 
@@ -168,6 +187,7 @@ This affects creativity of the model over all , not just how words are penalized
168
  Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
169
 
170
  CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 
171
  CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
172
 
173
 
@@ -176,6 +196,7 @@ CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
176
  Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
177
 
178
  CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 
179
  CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
180
 
181
  --penalize-nl penalize newline tokens (default: false)
@@ -188,28 +209,38 @@ SECONDARY SAMPLERS / FILTERS:
188
 
189
 
190
  --tfs N tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
 
191
  Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
192
  ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
193
 
194
 
195
  --typical N locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
 
196
  If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
197
 
198
 
199
  --mirostat N use Mirostat sampling.
200
  "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
201
  (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
 
202
  --mirostat-lr N Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
 
203
  --mirostat-ent N Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
204
 
205
  Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
 
206
  mirostat_tau: 5-8 is a good value.
 
207
  mirostat_eta: 0.1 is a good value.
208
 
 
209
  This is the big one ; activating this will help with creative generation. It can also help with stability.
 
210
  This is both a sampler (and pruner) and enhancement all in one.
211
 
 
212
  For Class 3 models it is suggested to use this to assist with generation (min settings).
 
213
  For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
214
 
215
 
@@ -217,25 +248,32 @@ For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr
217
  --dynatemp-exp N dynamic temperature exponent (default: 1.0)
218
 
219
  In: oobabooga/text-generation-webui (has on/off, and high / low) :
 
220
  Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent".
221
 
222
  This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
 
223
  For Kobold a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
224
 
225
  Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
226
 
227
  To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
 
228
  1 - Set the "temp" to 1.3 (the regular temp parameter)
 
229
  2 - Set the "range" to .500 (this gives you ".8" to "1.8" with "1.3" as the "base")
 
230
  3 - Set exp to 1 (or as you want).
231
 
232
  This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
233
 
234
 
235
  --xtc-probability N xtc probability (default: 0.0, 0.0 = disabled)
 
236
  Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
237
 
238
  --xtc-threshold N xtc threshold (default: 0.1, 1.0 = disabled)
 
239
  If 2 or more tokens have probability above this threshold, consider removing all but the last one.
240
 
241
  XTC is a new sampler, that adds an interesting twist in generation.
@@ -252,6 +290,7 @@ This may or may not be available. This requires a bit more work.
252
  IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
253
 
254
  This is a very powerful pruning method; which can drastically alter output generation.
 
255
  I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "word" / part word) then use this.
256
 
257
  Careful testing is required, as this can have unclear side effects.
@@ -277,6 +316,7 @@ ADVANCED SAMPLERS:
277
  ------------------------------------------------------------------------------
278
 
279
  I am not going to touch on all of them ; just the main ones ; for more info see:
 
280
  https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
281
 
282
  Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
@@ -284,36 +324,49 @@ Keep in mind these parameters/samplers become available (for GGUFs) in "oobaboog
284
  What I will touch on here are special settings for CLASS 3 and CLASS 4 models.
285
 
286
  For CLASS 3 you can use one, two or both.
 
287
  For CLASS 4 using BOTH are strongly recommended, or at minimum "QUADRATIC SAMPLING".
288
 
289
  These samplers (along with "penalty" settings) work in conjunction to "wrangle" the model / control it and get it to settle down, important for Class 3 but critical for Class 4 models.
 
290
  For other classes of models, these advanced samplers can enhance operation across the board.
291
 
292
  For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model inline rather than "over prune it".
 
293
  You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
294
 
295
 
296
  DRY:
297
 
298
  Class 3:
 
299
  dry_multiplier: .8
 
300
  dry_allowed_length: 2
 
301
  dry_base: 1
302
 
303
  Class 4:
 
304
  dry_multiplier: .8 to 1.12+
 
305
  dry_allowed_length: 2 (or less)
 
306
  dry_base: 1.15 to 1.5
307
 
308
 
309
  QUADRATIC SAMPLING:
310
 
311
  Class 3:
 
312
  smoothing_factor: 1 to 3
 
313
  smoothing_curve: 1
314
 
315
  Class 4:
 
316
  smoothing_factor: 3 to 5 (or higher)
 
317
  smoothing_curve: 1.5 to 2.
318
 
319
  Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
@@ -326,6 +379,7 @@ If you use Microstat, keep in mind this will interact with these two advanced sa
326
  Finally:
327
 
328
  Smaller quants may require STRONGER settings (all classes of models) due to compression damage, especially for Q2K, and IQ1/IQ2s.
 
329
  This is also influenced by the parameter size of the model in relation to the quant size.
330
 
331
  IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
 
27
  Primary Testing Parameters I use, including use for output generation examples at my repo:
28
 
29
  Ranged:
30
+
31
  temperature: 0 to 5 ("temp")
32
+
33
  repetition_penalty : 1.02 to 1.15 ("rep pen")
34
 
35
+
36
  Set:
37
+
38
  top_k:40
39
+
40
  min_p:0.05
41
+
42
  top_p: 0.95
43
+
44
  repeat-last-n: 64 (also called: "repetition_penalty_range" / "rp range" )
45
 
46
  (no other settings, parameter or samplers activated when generating examples)
 
118
  ------------------------------------------------------------------------------
119
 
120
  --temp N temperature (default: 0.8)
121
+
122
  Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
123
+
124
  Range 0 to 5. Increment at .1 per change.
125
+
126
  Too much temp can affect instruction following in some cases and sometimes not enough = boring generation.
127
+
128
  Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
129
 
130
  --top-p N top-p sampling (default: 0.9, 1.0 = disabled)
131
+
132
  If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
133
+
134
  I use default of: .95 ;
135
 
136
  --min-p N min-p sampling (default: 0.1, 0.0 = disabled)
137
+
138
  Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
139
+
140
  I use default: .05 ;
141
 
142
  --top-k N top-k sampling (default: 40, 0 = disabled)
143
+
144
  Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
145
+
146
  Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
147
 
148
  These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
 
170
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
171
 
172
  THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
173
+
174
  This setting also works in conjunction with all other "rep pens" below.
175
 
176
 
 
178
  (commonly called "rep pen")
179
 
180
  Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
181
+
182
  This affects creativity of the model over all , not just how words are penalized.
183
 
184
 
 
187
  Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
188
 
189
  CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
190
+
191
  CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
192
 
193
 
 
196
  Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
197
 
198
  CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
199
+
200
  CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
201
 
202
  --penalize-nl penalize newline tokens (default: false)
 
209
 
210
 
211
  --tfs N tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
212
+
213
  Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
214
  ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
215
 
216
 
217
  --typical N locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
218
+
219
  If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
220
 
221
 
222
  --mirostat N use Mirostat sampling.
223
  "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
224
  (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
225
+
226
  --mirostat-lr N Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
227
+
228
  --mirostat-ent N Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
229
 
230
  Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. (https://arxiv.org/abs/2007.14966)
231
+
232
  mirostat_tau: 5-8 is a good value.
233
+
234
  mirostat_eta: 0.1 is a good value.
235
 
236
+
237
  This is the big one ; activating this will help with creative generation. It can also help with stability.
238
+
239
  This is both a sampler (and pruner) and enhancement all in one.
240
 
241
+
242
  For Class 3 models it is suggested to use this to assist with generation (min settings).
243
+
244
  For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
245
 
246
 
 
248
  --dynatemp-exp N dynamic temperature exponent (default: 1.0)
249
 
250
  In: oobabooga/text-generation-webui (has on/off, and high / low) :
251
+
252
  Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent".
253
 
254
  This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
255
+
256
  For Kobold a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
257
 
258
  Class 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
259
 
260
  To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
261
+
262
  1 - Set the "temp" to 1.3 (the regular temp parameter)
263
+
264
  2 - Set the "range" to .500 (this gives you ".8" to "1.8" with "1.3" as the "base")
265
+
266
  3 - Set exp to 1 (or as you want).
267
 
268
  This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
269
 
270
 
271
  --xtc-probability N xtc probability (default: 0.0, 0.0 = disabled)
272
+
273
  Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
274
 
275
  --xtc-threshold N xtc threshold (default: 0.1, 1.0 = disabled)
276
+
277
  If 2 or more tokens have probability above this threshold, consider removing all but the last one.
278
 
279
  XTC is a new sampler, that adds an interesting twist in generation.
 
290
  IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
291
 
292
  This is a very powerful pruning method; which can drastically alter output generation.
293
+
294
  I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "word" / part word) then use this.
295
 
296
  Careful testing is required, as this can have unclear side effects.
 
316
  ------------------------------------------------------------------------------
317
 
318
  I am not going to touch on all of them ; just the main ones ; for more info see:
319
+
320
  https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
321
 
322
  Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
 
324
  What I will touch on here are special settings for CLASS 3 and CLASS 4 models.
325
 
326
  For CLASS 3 you can use one, two or both.
327
+
328
  For CLASS 4 using BOTH are strongly recommended, or at minimum "QUADRATIC SAMPLING".
329
 
330
  These samplers (along with "penalty" settings) work in conjunction to "wrangle" the model / control it and get it to settle down, important for Class 3 but critical for Class 4 models.
331
+
332
  For other classes of models, these advanced samplers can enhance operation across the board.
333
 
334
  For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model inline rather than "over prune it".
335
+
336
  You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
337
 
338
 
339
  DRY:
340
 
341
  Class 3:
342
+
343
  dry_multiplier: .8
344
+
345
  dry_allowed_length: 2
346
+
347
  dry_base: 1
348
 
349
  Class 4:
350
+
351
  dry_multiplier: .8 to 1.12+
352
+
353
  dry_allowed_length: 2 (or less)
354
+
355
  dry_base: 1.15 to 1.5
356
 
357
 
358
  QUADRATIC SAMPLING:
359
 
360
  Class 3:
361
+
362
  smoothing_factor: 1 to 3
363
+
364
  smoothing_curve: 1
365
 
366
  Class 4:
367
+
368
  smoothing_factor: 3 to 5 (or higher)
369
+
370
  smoothing_curve: 1.5 to 2.
371
 
372
  Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
 
379
  Finally:
380
 
381
  Smaller quants may require STRONGER settings (all classes of models) due to compression damage, especially for Q2K, and IQ1/IQ2s.
382
+
383
  This is also influenced by the parameter size of the model in relation to the quant size.
384
 
385
  IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.