DavidAU commited on
Commit
a65f58a
·
verified ·
1 Parent(s): b3cc717

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -20
README.md CHANGED
@@ -112,6 +112,7 @@ All information about parameters, samplers and advanced samplers applies to ALL
112
  QUANTS:
113
  - QUANTS Detailed information.
114
  - IMATRIX Quants
 
115
  - ADDITIONAL QUANT INFORMATION
116
  - ARM QUANTS / Q4_0_X_X
117
  - NEO Imatrix Quants / Neo Imatrix X Quants
@@ -123,7 +124,7 @@ SOURCE FILES for my Models / APPS to Run LLMs / AIs:
123
  - TEXT-GENERATION-WEBUI
124
  - KOBOLDCPP
125
  - SILLYTAVERN
126
- - OTHER PROGRAMS
127
 
128
  TESTING / Default / Generation Example PARAMETERS AND SAMPLERS
129
  - Basic settings suggested for general model operation.
@@ -212,6 +213,35 @@ The Imatrix process has NO effect on Q8 or F16 quants.
212
 
213
  F16 is full precision, just in GGUF format.
214
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
  ADDITONAL QUANT INFORMATION:
216
 
217
  <details>
@@ -371,7 +401,7 @@ For reference here are some Class 3/4 models:
371
 
372
  [ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]
373
 
374
- (note Grand Guttenberg Madness/Darkess (12B) are class 1 models, but compressed versions of 23.5B)
375
 
376
  Although Class 3 and Class 4 models will work when used within their specific use case(s), standard parameters and settings on the model card, I recognize that users want either a smoother experience
377
  and/or want to use these models for other than intended use case(s) and that is in part why I created this document.
@@ -533,13 +563,11 @@ Note for Class 3/Class 4 models settings/samplers (discussed below) "repeat-last
533
 
534
  Now that you have the basic parameters and samplers from the previous section, I will cover Generational Control and Steering.
535
 
536
- This section is optional.
537
 
538
- This section (in part) will cover how to deal with Class 3/4 issues directly, as well as general issues than can happen with any "class" of model during generation IF you want to control them manually as
539
  the "Quick Reference" and/or "Detailed Parameters, Samplers, and Advanced Samplers" will cover how to deal with any generation issue(s) automatically.
540
 
541
- This section will also cover how to manually STEER generation(s) - ANY MODEL, ANY TYPE.
542
-
543
  There is a very important concept that must be covered first:
544
 
545
  The output/generation/answer to your prompt/instructions BECOMES part of your "prompt" after you click STOP, and then click on "CONTINUE".
@@ -552,7 +580,7 @@ When you hit "REGEN" this nullifies only the last "generation" - not the prompt
552
 
553
  The part I will cover here is once a generation has started, from a single prompt (no other prompts/generations in the chat).
554
 
555
- So lets start with a prompt (NOTE: this one has no steering in the instructions):
556
 
557
  Start a 1000 word scene (vivid horror, 1st person, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...
558
 
@@ -590,6 +618,7 @@ These methods apply to all generation types - not just a "scene" or "story", but
590
 
591
  Notes:
592
  - For Text Generation Webui, you can transfer your "chat" to "notebook" for easy Stop/Edit/Continue function.
 
593
  - For Silly Tavern -> This is built in.
594
  - For LMStudio -> This is built in.
595
  - For API (direct control) you have to send the "chat" elements back to the "server" with the "edits" (send the whole "revised" chat as a json payload).
@@ -616,7 +645,19 @@ If you have single or multiple paragraph repeat(s):
616
  - Hit continue.
617
  - Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph)
618
 
619
- In each case we are BREAKING the "condition" that lead (or lead into) to the repeat(s).
 
 
 
 
 
 
 
 
 
 
 
 
620
 
621
  <B>Advanced Steering / Fixing Issues (any model, any type) and "sequenced" parameter/sampler change(s)</B>
622
 
@@ -671,16 +712,18 @@ You may want to modify the instructions to provide a "steering" continue point a
671
 
672
  ---
673
 
674
- Compiled by: "EnragedAntelope"
675
-
676
- https://huggingface.co/EnragedAntelope
677
-
678
- https://github.com/EnragedAntelope
679
 
680
  This section will get you started - especially with class 3 and 4 models - and the detail section will cover settings / control in more depth below.
681
 
682
  Please see sections below this for advanced usage, more details, settings, notes etc etc.
683
 
 
 
 
 
 
 
684
  <small>
685
  # LLM Parameters Reference Table
686
 
@@ -736,7 +779,7 @@ Please see sections below this for advanced usage, more details, settings, notes
736
 
737
  | **Advanced Samplers** |
738
 
739
- | dry_multiplier | Controls DRY (Don't Repeat Yourself) intensity. Range: 0.8-1.12+ |
740
 
741
  | dry_allowed_length | Allowed length for repeated sequences in DRY. Default: 2 |
742
 
@@ -749,14 +792,22 @@ Please see sections below this for advanced usage, more details, settings, notes
749
 
750
  ## Notes
751
 
752
- - For Class 3 and 4 models, using both DRY and Quadratic sampling is recommended
753
  - Lower quants (Q2K, IQ1s, IQ2s) may require stronger settings due to compression damage
754
  - Parameters interact with each other, so test changes one at a time
755
  - Always test with temperature at 0 first to establish a baseline
756
 
757
  </small>
758
 
759
- IMPORTANT: Make sure to review MIROSTAT sampler settings below, due to behaviour of this specific sampler / affect on parameters/other samplers.
 
 
 
 
 
 
 
 
760
 
761
  ---
762
 
@@ -935,7 +986,11 @@ Generally this is not used.
935
 
936
  In some AI/LLM apps, these may only be available via JSON file modification and/or API.
937
 
938
- For "text-gen-webui" and "Koboldcpp" these are directly accessible (and via Sillytavern IF you use either of these APPS to connect Silly Tavern to their API).
 
 
 
 
939
 
940
  <B>i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):</B>
941
 
@@ -1083,7 +1138,7 @@ Careful testing is required, as this can have unclear side effects.
1083
 
1084
  Note #1 :
1085
 
1086
- You can use these samplers via Sillytavern IF you use either of these APPS (Koboldcpp/Text Generation Webui) to connect Silly Tavern to their API.
1087
 
1088
  Other Notes:
1089
 
@@ -1112,8 +1167,8 @@ However, you should also check / test operation of (these are in Text Generation
1112
  a] Affects per token generation:
1113
 
1114
  - top_a
1115
- - epsilon_cutoff - see note 4
1116
- - eta_cutoff - see note 4
1117
  - no_repeat_ngram_size - see note #1.
1118
 
1119
  b] Affects generation including phrase, sentence, paragraph and entire generation:
@@ -1227,6 +1282,8 @@ Hopefully this powerful sampler will soon appear in all LLM/AI apps.
1227
 
1228
  You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
1229
 
 
 
1230
  This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
1231
 
1232
  This is a game changer in custom real time control of the model.
 
112
  QUANTS:
113
  - QUANTS Detailed information.
114
  - IMATRIX Quants
115
+ - QUANTS GENERATIONAL DIFFERENCES:
116
  - ADDITIONAL QUANT INFORMATION
117
  - ARM QUANTS / Q4_0_X_X
118
  - NEO Imatrix Quants / Neo Imatrix X Quants
 
124
  - TEXT-GENERATION-WEBUI
125
  - KOBOLDCPP
126
  - SILLYTAVERN
127
+ - Lmstudio, Ollma, Llamacpp, and OTHER PROGRAMS
128
 
129
  TESTING / Default / Generation Example PARAMETERS AND SAMPLERS
130
  - Basic settings suggested for general model operation.
 
213
 
214
  F16 is full precision, just in GGUF format.
215
 
216
+ QUANTS GENERATIONAL DIFFERENCES:
217
+
218
+ Higher quants will have more detail, nuance and in some cases stronger "emotional" levels. Characters will also be
219
+ more "fleshed out" too. Sense of "there" will also increase.
220
+
221
+ Likewise for any use case -> higher quants nuance (both instruction following AND output generation) will be higher.
222
+
223
+ "Nuance" is critical for both understanding, as well as the quality of the output generation.
224
+
225
+ To put this another way, "nuance" is lost as the full precision model is more and more compressed (lower and lower quants).
226
+
227
+ Some of this can be counteracted by parameters and/or Imatrix (as noted earlier).
228
+
229
+ IQ4XS / IQ4NL quants:
230
+
231
+ Due to the unusual nature of this quant (mixture/processing), generations from it will be different then other quants.
232
+
233
+ These quants can also be "quanted" with or without an Imatrix.
234
+
235
+ You may want to try it / compare it to other quant(s) output.
236
+
237
+ Special note on Q2k/Q3 quants:
238
+
239
+ You may need to use temp 2 or lower with these quants (1 or lower for q2k). Just too much compression at this level, damaging the model.
240
+
241
+ IQ quants (and Imatrix versions of q2k/q3) perform better at these "BPW" levels.
242
+
243
+ Rep pen adjustments may also be required to get the most out a model at this/these quant level(s).
244
+
245
  ADDITONAL QUANT INFORMATION:
246
 
247
  <details>
 
401
 
402
  [ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]
403
 
404
+ (note Grand Guttenberg Madness/Darkness (12B) are class 1 models, but compressed versions of 23.5B)
405
 
406
  Although Class 3 and Class 4 models will work when used within their specific use case(s), standard parameters and settings on the model card, I recognize that users want either a smoother experience
407
  and/or want to use these models for other than intended use case(s) and that is in part why I created this document.
 
563
 
564
  Now that you have the basic parameters and samplers from the previous section, I will cover Generational Control and Steering.
565
 
566
+ This section is optional and covers how to manually STEER generation(s) - ANY MODEL, ANY TYPE.
567
 
568
+ This section (in part) will also cover how to deal with Class 3/4 model issues directly, as well as general issues than can happen with any "class" of model during generation IF you want to control them manually as
569
  the "Quick Reference" and/or "Detailed Parameters, Samplers, and Advanced Samplers" will cover how to deal with any generation issue(s) automatically.
570
 
 
 
571
  There is a very important concept that must be covered first:
572
 
573
  The output/generation/answer to your prompt/instructions BECOMES part of your "prompt" after you click STOP, and then click on "CONTINUE".
 
580
 
581
  The part I will cover here is once a generation has started, from a single prompt (no other prompts/generations in the chat).
582
 
583
+ So lets start with a prompt (NOTE: this prompt has no "steering" in the instructions):
584
 
585
  Start a 1000 word scene (vivid horror, 1st person, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...
586
 
 
618
 
619
  Notes:
620
  - For Text Generation Webui, you can transfer your "chat" to "notebook" for easy Stop/Edit/Continue function.
621
+ - For KoboldCPP -> This is built in.
622
  - For Silly Tavern -> This is built in.
623
  - For LMStudio -> This is built in.
624
  - For API (direct control) you have to send the "chat" elements back to the "server" with the "edits" (send the whole "revised" chat as a json payload).
 
645
  - Hit continue.
646
  - Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph)
647
 
648
+ In each case we are BREAKING the "condition(s)" that lead (or lead into) to the repeat(s).
649
+
650
+ If you have "rants" and/or "model has lost its mind":
651
+
652
+ - Stop generation, edit out all the paragraph(s), and back AS FAR as possible to where is appears the rant/mind loss occured (delete ALL) and delete one additional paragraph / 2 or more sentences.
653
+ - Hit continue.
654
+ - Better: Do these steps, and add "steering" (last line -> word, phrase, sentence or paragraph).
655
+
656
+ Class 3/4 model additional note:
657
+
658
+ With these classes of model, you MAY need to "edit" / "revise" further back than one or two lines / one paragraph - they sometimes need just a little more editing.
659
+
660
+ Another option is using "Cold" Editing/Generation explained below.
661
 
662
  <B>Advanced Steering / Fixing Issues (any model, any type) and "sequenced" parameter/sampler change(s)</B>
663
 
 
712
 
713
  ---
714
 
715
+ Compiled by: "EnragedAntelope" ( https://huggingface.co/EnragedAntelope || https://github.com/EnragedAntelope )
 
 
 
 
716
 
717
  This section will get you started - especially with class 3 and 4 models - and the detail section will cover settings / control in more depth below.
718
 
719
  Please see sections below this for advanced usage, more details, settings, notes etc etc.
720
 
721
+ IMPORTANT NOTES:
722
+
723
+ Not all parameters, samplers and advanced samplers are listed in this quick reference section. Scroll down to see all of them in following sections.
724
+
725
+ Likewise there may be some "name variation(s)" - in other LLM/AI apps - this is addressed in the detailed sections.
726
+
727
  <small>
728
  # LLM Parameters Reference Table
729
 
 
779
 
780
  | **Advanced Samplers** |
781
 
782
+ | dry_multiplier | Controls DRY (Don't Repeat Yourself) intensity. Range: 0.8-1.12+ Class 3 (Class 4 is higher) |
783
 
784
  | dry_allowed_length | Allowed length for repeated sequences in DRY. Default: 2 |
785
 
 
792
 
793
  ## Notes
794
 
795
+ - For Class 3 and 4 models, using both DRY and Quadratic sampling is recommended (see advanced/detailed samplers below on how to control the model here directly)
796
  - Lower quants (Q2K, IQ1s, IQ2s) may require stronger settings due to compression damage
797
  - Parameters interact with each other, so test changes one at a time
798
  - Always test with temperature at 0 first to establish a baseline
799
 
800
  </small>
801
 
802
+ CLASS 3/4 Models:
803
+
804
+ If you are using a class 3 or class 4 model for use case(s) such as role play, multi-turn, chat etc etc, it is suggested to activate / set all samplers for class 3 but may be required for class 4 models.
805
+
806
+ Likewise for fine control of a class 3/4 via "DRY" and "Quadratic" samplers is detailed below. These allow you to dial up or dial down the model's raw power directly.
807
+
808
+ MICROSTAT Sampler - IMPORTANT:
809
+
810
+ Make sure to review MIROSTAT sampler settings below, due to behaviour of this specific sampler / affect on parameters/other samplers which varies from app to app too.
811
 
812
  ---
813
 
 
986
 
987
  In some AI/LLM apps, these may only be available via JSON file modification and/or API.
988
 
989
+ For "text-gen-webui", "Koboldcpp" these are directly accessible ; other programs/app this varies.
990
+
991
+ Sillytavern:
992
+
993
+ If the apps support (Sillytavern is connected to via API) these parameters/samplers then you can access them via Silly Tavern's parameter/sampler panel. So if you are using Text-Gen-Webui, Koboldcpp, LMStudio, Llamacpp, Ollama (etc) you can set/change/access all or most of these.
994
 
995
  <B>i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):</B>
996
 
 
1138
 
1139
  Note #1 :
1140
 
1141
+ You can use these samplers via Sillytavern IF you use either of these APPS (Koboldcpp/Text Generation Webui/App supports them) to connect Silly Tavern to their API.
1142
 
1143
  Other Notes:
1144
 
 
1167
  a] Affects per token generation:
1168
 
1169
  - top_a
1170
+ - epsilon_cutoff - see note #4
1171
+ - eta_cutoff - see note #4
1172
  - no_repeat_ngram_size - see note #1.
1173
 
1174
  b] Affects generation including phrase, sentence, paragraph and entire generation:
 
1282
 
1283
  You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
1284
 
1285
+ You can also access in SillyTavern if you use KoboldCPP as your "API" connected app too.
1286
+
1287
  This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
1288
 
1289
  This is a game changer in custom real time control of the model.