DavidAU commited on
Commit
83fc0d3
·
verified ·
1 Parent(s): 0cfeb72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -38
README.md CHANGED
@@ -16,6 +16,15 @@ These settings / suggestions can be applied to all models including GGUF, EXL2,
16
  It also includes critical settings for Class 3 and Class 4 models at this repo - DavidAU - to enhance and control generation
17
  for specific as a well as outside use case(s) including role play, chat and other use case(s).
18
 
 
 
 
 
 
 
 
 
 
19
  Even if you are not using my models, you may find this document useful for any model available online.
20
 
21
  If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
@@ -30,14 +39,13 @@ PARAMETERS AND SAMPLERS
30
 
31
  Primary Testing Parameters I use, including use for output generation examples at my repo:
32
 
33
- Ranged:
34
 
35
  temperature: 0 to 5 ("temp")
36
 
37
  repetition_penalty : 1.02 to 1.15 ("rep pen")
38
 
39
-
40
- Set:
41
 
42
  top_k:40
43
 
@@ -47,7 +55,15 @@ top_p: 0.95
47
 
48
  repeat-last-n: 64 (also called: "repetition_penalty_range" / "rp range" )
49
 
50
- (no other settings, parameter or samplers activated when generating examples)
 
 
 
 
 
 
 
 
51
 
52
  Below are all the LLAMA_CPP parameters and samplers.
53
 
@@ -56,6 +72,7 @@ I have added notes below each one for adjustment / enhancement(s) for specific u
56
  Following this section will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui .
57
 
58
  The "llamacpp_HF" only requires the GGUF you want to use plus a few config files from "source repo" of the model.
 
59
  (this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
60
 
61
  This allows access to very advanced samplers in addition to all the parameters / samplers here.
@@ -78,6 +95,7 @@ https://github.com/ggerganov/llama.cpp
78
 
79
  (scroll down on the main page for more apps/programs to use GGUFs too)
80
 
 
81
 
82
  CRITICAL NOTES:
83
 
@@ -98,6 +116,7 @@ The goal here is to use parameters to raise/lower the power of the model and sam
98
 
99
  With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
100
 
 
101
 
102
  QUANTS:
103
 
@@ -121,7 +140,12 @@ IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to
121
  PRIMARY PARAMETERS:
122
  ------------------------------------------------------------------------------
123
 
124
- --temp N
 
 
 
 
 
125
 
126
  temperature (default: 0.8)
127
 
@@ -133,7 +157,7 @@ Too much temp can affect instruction following in some cases and sometimes not e
133
 
134
  Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
135
 
136
- --top-p N
137
 
138
  top-p sampling (default: 0.9, 1.0 = disabled)
139
 
@@ -141,7 +165,7 @@ If not set to 1, select tokens with probabilities adding up to less than this nu
141
 
142
  I use default of: .95 ;
143
 
144
- --min-p N
145
 
146
  min-p sampling (default: 0.1, 0.0 = disabled)
147
 
@@ -149,7 +173,7 @@ Tokens with probability smaller than (min_p) * (probability of the most likely t
149
 
150
  I use default: .05 ;
151
 
152
- --top-k N
153
 
154
  top-k sampling (default: 40, 0 = disabled)
155
 
@@ -157,10 +181,7 @@ Similar to top_p, but select instead only the top_k most likely tokens. Higher v
157
 
158
  Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
159
 
160
- These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
161
-
162
- Keep in mind the biggest parameter / random "unknown" is your prompt. A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the
163
- output, even at min temp settings. CAPS also affect generation too.
164
 
165
  For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time. Then adjust a word, phrase, sentence etc - to see the differences.
166
  Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
@@ -178,7 +199,11 @@ Then test "at temp" to see the MODELS in action. (5-10 generations recommended)
178
  PENALITY SAMPLERS:
179
  ------------------------------------------------------------------------------
180
 
181
- --repeat-last-n N
 
 
 
 
182
 
183
  last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
184
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
@@ -187,8 +212,11 @@ THIS IS CRITICAL. Too high you can get all kinds of issues (repeat words, senten
187
 
188
  This setting also works in conjunction with all other "rep pens" below.
189
 
 
190
 
191
- --repeat-penalty N
 
 
192
 
193
  penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
194
  (commonly called "rep pen")
@@ -198,28 +226,32 @@ Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01..
198
  This affects creativity of the model over all , not just how words are penalized.
199
 
200
 
201
- --presence-penalty N
202
 
203
  repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
204
 
205
  Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
206
 
207
- CLASS 3: 0.05 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
208
 
209
- CLASS 4: 0.1 to 0.25 may assist generation BUT SET "--repeat-last-n" to 64
210
 
211
 
212
- --frequency-penalty N
213
 
214
  repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
215
 
216
  Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
217
 
218
- CLASS 3: 0.25 may assist generation BUT SET "--repeat-last-n" to 512 or less. Better is 128 or 64.
 
 
 
 
 
219
 
220
- CLASS 4: 0.7 to 0.8 may assist generation BUT SET "--repeat-last-n" to 64.
221
 
222
- --penalize-nl penalize newline tokens (default: false)
223
  Generally this is not used.
224
 
225
 
@@ -228,7 +260,7 @@ SECONDARY SAMPLERS / FILTERS:
228
  ------------------------------------------------------------------------------
229
 
230
 
231
- --tfs N
232
 
233
  tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
234
 
@@ -236,23 +268,23 @@ Tries to detect a tail of low-probability tokens in the distribution and removes
236
  ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
237
 
238
 
239
- --typical N
240
 
241
  locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
242
 
243
  If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
244
 
245
 
246
- --mirostat N
247
 
248
  use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
249
  (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
250
 
251
- --mirostat-lr N
252
 
253
  Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
254
 
255
- --mirostat-ent N
256
 
257
  Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
258
 
@@ -273,11 +305,11 @@ For Class 3 models it is suggested to use this to assist with generation (min se
273
  For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
274
 
275
 
276
- --dynatemp-range N
277
 
278
  dynamic temperature range (default: 0.0, 0.0 = disabled)
279
 
280
- --dynatemp-exp N
281
 
282
  dynamic temperature exponent (default: 1.0)
283
 
@@ -302,13 +334,13 @@ To set manually (IE: Api, lmstudio, etc) using "range" and "exp" ; this is a bit
302
  This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
303
 
304
 
305
- --xtc-probability N
306
 
307
  xtc probability (default: 0.0, 0.0 = disabled)
308
 
309
  Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
310
 
311
- --xtc-threshold N
312
 
313
  xtc threshold (default: 0.1, 1.0 = disabled)
314
 
@@ -319,7 +351,7 @@ Suggest you experiment with this one, with other advanced samplers disabled to s
319
 
320
 
321
 
322
- -l, --logit-bias TOKEN_ID(+/-)BIAS
323
 
324
  modifies the likelihood of token appearing in the completion,
325
  i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
@@ -341,19 +373,19 @@ OTHER:
341
  ------------------------------------------------------------------------------
342
 
343
 
344
- -s, --seed SEED
345
 
346
  RNG seed (default: -1, use random seed for -1)
347
 
348
- --samplers SAMPLERS
349
 
350
  samplers that will be used for generation in the order, separated by ';' (default: top_k;tfs_z;typ_p;top_p;min_p;xtc;temperature)
351
 
352
- --sampling-seq SEQUENCE
353
 
354
  simplified sequence for samplers that will be used (default: kfypmxt)
355
 
356
- --ignore-eos
357
 
358
  ignore end of stream token and continue generating (implies --logit-bias EOS-inf)
359
 
@@ -383,7 +415,7 @@ For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model
383
  You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
384
 
385
 
386
- DRY:
387
 
388
  Class 3:
389
 
@@ -402,7 +434,8 @@ dry_allowed_length: 2 (or less)
402
  dry_base: 1.15 to 1.5
403
 
404
 
405
- QUADRATIC SAMPLING:
 
406
 
407
  Class 3:
408
 
@@ -416,6 +449,9 @@ smoothing_factor: 3 to 5 (or higher)
416
 
417
  smoothing_curve: 1.5 to 2.
418
 
 
 
 
419
  Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
420
  for operation of CLASS 4 models for chat / role play and/or "smoother operation".
421
 
@@ -429,4 +465,4 @@ Smaller quants may require STRONGER settings (all classes of models) due to comp
429
 
430
  This is also influenced by the parameter size of the model in relation to the quant size.
431
 
432
- IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
 
16
  It also includes critical settings for Class 3 and Class 4 models at this repo - DavidAU - to enhance and control generation
17
  for specific as a well as outside use case(s) including role play, chat and other use case(s).
18
 
19
+ This settings can also fix a number of model issues such as:
20
+
21
+ - "Gibberish"
22
+ - letter, word, phrase, paragraph repeats
23
+ - coherence
24
+ - creativeness or lack there of or .. too much - purple prose.
25
+
26
+ Likewise setting can also improve model generation and/or general overall "smoothness" / "quality" of model operation.
27
+
28
  Even if you are not using my models, you may find this document useful for any model available online.
29
 
30
  If you are currently using model(s) that are difficult to "wrangle" then apply "Class 3" or "Class 4" settings to them.
 
39
 
40
  Primary Testing Parameters I use, including use for output generation examples at my repo:
41
 
42
+ <B>Ranged Parameters:</B>
43
 
44
  temperature: 0 to 5 ("temp")
45
 
46
  repetition_penalty : 1.02 to 1.15 ("rep pen")
47
 
48
+ <B>Set parameters:</B>
 
49
 
50
  top_k:40
51
 
 
55
 
56
  repeat-last-n: 64 (also called: "repetition_penalty_range" / "rp range" )
57
 
58
+ I do not set any other settings, parameters or have samplers activated when generating examples.
59
+
60
+ Everything else is "zeroed" / "disabled".
61
+
62
+ These parameters/settings are considered both safe and default and in most cases available to all users in all apps.
63
+
64
+ ---
65
+
66
+ <B>Llama CPP Parameters, Samplers and Advanced Samplers</B>
67
 
68
  Below are all the LLAMA_CPP parameters and samplers.
69
 
 
72
  Following this section will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui .
73
 
74
  The "llamacpp_HF" only requires the GGUF you want to use plus a few config files from "source repo" of the model.
75
+
76
  (this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
77
 
78
  This allows access to very advanced samplers in addition to all the parameters / samplers here.
 
95
 
96
  (scroll down on the main page for more apps/programs to use GGUFs too)
97
 
98
+ ---
99
 
100
  CRITICAL NOTES:
101
 
 
116
 
117
  With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model AND NO advanced settings, or samplers.
118
 
119
+ ---
120
 
121
  QUANTS:
122
 
 
140
  PRIMARY PARAMETERS:
141
  ------------------------------------------------------------------------------
142
 
143
+ These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
144
+
145
+ Keep in mind the biggest parameter / random "unknown" is your prompt. A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the
146
+ output, even at min temp settings. CAPS also affect generation too.
147
+
148
+ <B>temp / temperature</B>
149
 
150
  temperature (default: 0.8)
151
 
 
157
 
158
  Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
159
 
160
+ <B>top-p</B>
161
 
162
  top-p sampling (default: 0.9, 1.0 = disabled)
163
 
 
165
 
166
  I use default of: .95 ;
167
 
168
+ <B>min-p</B>
169
 
170
  min-p sampling (default: 0.1, 0.0 = disabled)
171
 
 
173
 
174
  I use default: .05 ;
175
 
176
+ <B>top-k</B>
177
 
178
  top-k sampling (default: 40, 0 = disabled)
179
 
 
181
 
182
  Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
183
 
184
+ NOTES:
 
 
 
185
 
186
  For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time. Then adjust a word, phrase, sentence etc - to see the differences.
187
  Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
 
199
  PENALITY SAMPLERS:
200
  ------------------------------------------------------------------------------
201
 
202
+ These samplers "trim" or "prune" output.
203
+
204
+ PRIMARY:
205
+
206
+ <B>repeat-last-n</B>
207
 
208
  last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
209
  ("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
 
212
 
213
  This setting also works in conjunction with all other "rep pens" below.
214
 
215
+ This parameter is the "RANGE" of tokens looked at for the samplers directly below.
216
 
217
+ SECONDARIES:
218
+
219
+ <B>repeat-penalty</B>
220
 
221
  penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
222
  (commonly called "rep pen")
 
226
  This affects creativity of the model over all , not just how words are penalized.
227
 
228
 
229
+ <B>presence-penalty</B>
230
 
231
  repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
232
 
233
  Generally leave this at zero IF repeat-last-n is 256 or less. You may want to use this for higher repeat-last-n settings.
234
 
235
+ CLASS 3: 0.05 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
236
 
237
+ CLASS 4: 0.1 to 0.25 may assist generation BUT SET "repeat-last-n" to 64
238
 
239
 
240
+ <B>frequency-penalty</B>
241
 
242
  repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
243
 
244
  Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
245
 
246
+ CLASS 3: 0.25 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
247
+
248
+ CLASS 4: 0.7 to 0.8 may assist generation BUT SET "repeat-last-n" to 64.
249
+
250
+
251
+ <B>penalize-nl </B>
252
 
253
+ penalize newline tokens (default: false)
254
 
 
255
  Generally this is not used.
256
 
257
 
 
260
  ------------------------------------------------------------------------------
261
 
262
 
263
+ <B>tfs</B>
264
 
265
  tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
266
 
 
268
  ( https://www.trentonbricken.com/Tail-Free-Sampling/ )
269
 
270
 
271
+ <B>typical</B>
272
 
273
  locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
274
 
275
  If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
276
 
277
 
278
+ <B>mirostat</B>
279
 
280
  use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used.
281
  (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
282
 
283
+ <B>mirostat-lr</B>
284
 
285
  Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
286
 
287
+ <B>mirostat-ent</B>
288
 
289
  Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
290
 
 
305
  For Class 4 models it is highly recommended with Microstat 1 or 2 + mirostat-lr @ 6 to 8 and mirostat_eta at .1 to .5
306
 
307
 
308
+ <B>dynatemp-range</B>
309
 
310
  dynamic temperature range (default: 0.0, 0.0 = disabled)
311
 
312
+ <B>dynatemp-exp</B>
313
 
314
  dynamic temperature exponent (default: 1.0)
315
 
 
334
  This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
335
 
336
 
337
+ <B>xtc-probability</B>
338
 
339
  xtc probability (default: 0.0, 0.0 = disabled)
340
 
341
  Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
342
 
343
+ <B>xtc-threshold</B>
344
 
345
  xtc threshold (default: 0.1, 1.0 = disabled)
346
 
 
351
 
352
 
353
 
354
+ <B>l, logit-bias TOKEN_ID(+/-)BIAS </B>
355
 
356
  modifies the likelihood of token appearing in the completion,
357
  i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',
 
373
  ------------------------------------------------------------------------------
374
 
375
 
376
+ <B>-s, --seed SEED </B>
377
 
378
  RNG seed (default: -1, use random seed for -1)
379
 
380
+ <B>samplers SAMPLERS </B>
381
 
382
  samplers that will be used for generation in the order, separated by ';' (default: top_k;tfs_z;typ_p;top_p;min_p;xtc;temperature)
383
 
384
+ <B>sampling-seq SEQUENCE </B>
385
 
386
  simplified sequence for samplers that will be used (default: kfypmxt)
387
 
388
+ <B>ignore-eos </B>
389
 
390
  ignore end of stream token and continue generating (implies --logit-bias EOS-inf)
391
 
 
415
  You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
416
 
417
 
418
+ <B>DRY:</B>
419
 
420
  Class 3:
421
 
 
434
  dry_base: 1.15 to 1.5
435
 
436
 
437
+ <B>QUADRATIC SAMPLING:</B>
438
+
439
 
440
  Class 3:
441
 
 
449
 
450
  smoothing_curve: 1.5 to 2.
451
 
452
+
453
+ IMPORTANT:
454
+
455
  Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
456
  for operation of CLASS 4 models for chat / role play and/or "smoother operation".
457
 
 
465
 
466
  This is also influenced by the parameter size of the model in relation to the quant size.
467
 
468
+ IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.