parameters guide
samplers guide
model generation
role play settings
quant selection
arm quants
iq quants vs q quants
optimal model setting
gibberish fixes
coherence
instructing following
quality generation
chat settings
quality settings
llamacpp server
llamacpp
lmstudio
sillytavern
koboldcpp
backyard
ollama
model generation steering
steering
model generation fixes
text generation webui
ggufs
exl2
full precision
quants
imatrix
neo imatrix
File size: 43,210 Bytes
beb2f82 0cfeb72 7b452b9 42a32bb beb2f82 6baeadc beb2f82 6b95a3c beb2f82 d3c3506 83fc0d3 c231353 0b9ff9b d3c3506 c231353 d3c3506 83fc0d3 459cd0c 6b95a3c 459cd0c 6b95a3c 83fc0d3 6b95a3c beb2f82 6b95a3c beb2f82 b7b76a3 beb2f82 c79fc9f 459cd0c c79fc9f 48f81ec c79fc9f a4711b1 48f81ec beb2f82 83fc0d3 bc03205 beb2f82 bc03205 beb2f82 83fc0d3 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 83fc0d3 6b95a3c 83fc0d3 a4711b1 83fc0d3 a4711b1 beb2f82 4f473e2 459cd0c beb2f82 459cd0c beb2f82 459cd0c beb2f82 459cd0c 83fc0d3 6b95a3c beb2f82 459cd0c 6bf9d7c 459cd0c 6bf9d7c 459cd0c beb2f82 459cd0c beb2f82 459cd0c beb2f82 459cd0c beb2f82 459cd0c 3f1e840 a4711b1 6b95a3c 3f1e840 459cd0c beb2f82 a4711b1 beb2f82 9a210b0 beb2f82 6b95a3c 864a76a 6b95a3c 42a32bb a4711b1 864a76a a4711b1 c5581af a4711b1 459cd0c 4f473e2 a4711b1 459cd0c c5581af 6b95a3c c5581af 8eda6b7 c5581af 8eda6b7 c5581af 8eda6b7 459cd0c 8eda6b7 459cd0c 8eda6b7 3f1e840 42a32bb a4711b1 8eda6b7 83fc0d3 beb2f82 a4711b1 beb2f82 4f473e2 6ef3083 beb2f82 6ef3083 beb2f82 6ef3083 a4711b1 6ef3083 a4711b1 6ef3083 a4711b1 6ef3083 a4711b1 6ef3083 beb2f82 9a210b0 beb2f82 48f81ec beb2f82 6ef3083 48f81ec 6ef3083 83fc0d3 beb2f82 a4711b1 beb2f82 4f473e2 beb2f82 8eda6b7 3f1e840 9a210b0 beb2f82 3f1e840 ac491dd 6b95a3c ac491dd 8eda6b7 d5585f6 8eda6b7 ac491dd 6b95a3c ac491dd 6b95a3c ac491dd 1eeabd8 a4711b1 1eeabd8 a4711b1 42a32bb 1eeabd8 a4711b1 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 cbe3629 1eeabd8 459cd0c a4711b1 459cd0c 4f473e2 459cd0c beb2f82 48f81ec a4711b1 48f81ec beb2f82 83fc0d3 9a210b0 83fc0d3 c79fc9f 864a76a dcaadb6 864a76a dcaadb6 864a76a 83fc0d3 0cfeb72 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 83fc0d3 0cfeb72 bc03205 beb2f82 bc03205 c79fc9f beb2f82 83fc0d3 0cfeb72 bc03205 beb2f82 bc03205 beb2f82 c79fc9f 83fc0d3 0cfeb72 bc03205 beb2f82 bc03205 beb2f82 459cd0c c79fc9f beb2f82 9a210b0 c79fc9f 9a210b0 beb2f82 c79fc9f beb2f82 c79fc9f beb2f82 c79fc9f beb2f82 c79fc9f beb2f82 48f81ec a4711b1 48f81ec beb2f82 c79fc9f 83fc0d3 459cd0c c231353 c79fc9f 83fc0d3 807d83b beb2f82 9a210b0 bc03205 beb2f82 83fc0d3 beb2f82 c79fc9f 83fc0d3 807d83b beb2f82 bc03205 9a210b0 beb2f82 83fc0d3 807d83b beb2f82 9a210b0 beb2f82 9a210b0 bc03205 9a210b0 beb2f82 83fc0d3 807d83b beb2f82 83fc0d3 9a210b0 83fc0d3 bc03205 83fc0d3 beb2f82 48f81ec ccb970d 48f81ec beb2f82 459cd0c beb2f82 ccb970d beb2f82 1ee6868 beb2f82 83fc0d3 807d83b 459cd0c bc03205 ccb970d 807d83b bc03205 459cd0c ccb970d 807d83b beb2f82 459cd0c bc03205 9a210b0 c231353 bc03205 beb2f82 9a210b0 bc03205 459cd0c bc03205 459cd0c beb2f82 a4711b1 beb2f82 ccb970d 807d83b ccb970d 807d83b beb2f82 bc03205 beb2f82 bc03205 459cd0c beb2f82 48f81ec beb2f82 459cd0c bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 1ee6868 459cd0c a4711b1 d5585f6 807d83b d5585f6 bc03205 beb2f82 ccb970d 807d83b bc03205 beb2f82 83fc0d3 807d83b c231353 beb2f82 6b95a3c beb2f82 bc03205 beb2f82 48f81ec 1ee6868 ccb970d 459cd0c a4711b1 459cd0c 48f81ec ccb970d 48f81ec beb2f82 ccb970d 459cd0c bc03205 beb2f82 459cd0c ccb970d beb2f82 459cd0c a4711b1 459cd0c fc102ef ac491dd 459cd0c ac491dd fc102ef 459cd0c fc102ef 459cd0c ccb970d 459cd0c beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 83fc0d3 beb2f82 b7b76a3 459cd0c beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 b7b76a3 beb2f82 ac491dd beb2f82 a4711b1 83fc0d3 b7b76a3 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 bc03205 beb2f82 b7b76a3 83fc0d3 ac491dd 459cd0c ac491dd 459cd0c 48f81ec 83fc0d3 beb2f82 459cd0c beb2f82 459cd0c beb2f82 48f81ec beb2f82 bc03205 beb2f82 83fc0d3 9a210b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 |
---
license: apache-2.0
tags:
- parameters guide
- samplers guide
- model generation
- role play settings
- optimal model setting
- gibberish fixes
- coherrence
- instructing following
- quality generation
- chat settings
- quality settings
- llamacpp server
- llamacpp
- lmstudio
- sillytavern
- koboldcpp
- backyard
- ollama
- text generation webui
- ggufs
- exl2
- full precision
- quants
- imatrix
- neo imatrix
---
<h3>Maximizing Model Performance for All Quants Types And Full-Precision using Samplers, Advance Samplers and Parameters Guide</h3>
This document includes detailed information, references, and notes for general parameters, samplers and
advanced samplers to get the most out of your model's abilities including notes / settings for the most popular AI/LLM app in use (LLAMACPP, KoboldCPP, Text-Generation-WebUI, LMStudio, Sillytavern, Ollama and others).
These settings / suggestions can be applied to all models including GGUF, EXL2, GPTQ, HQQ, AWQ and full source/precision.
It also includes critical settings for Class 3 and Class 4 models at this repo - DavidAU - to enhance and control generation
for specific as a well as outside use case(s) including role play, chat and other use case(s).
The settings discussed in this document can also fix a number of model issues (<B>any model, any repo</B>) such as:
- "Gibberish"
- Generation length (including out of control generation)
- Chat quality / Multi-Turn convos.
- Multi-turn / COT / and other multi prompt/answer generation
- Letter, word, phrase, paragraph repeats
- Coherence
- Instruction following
- Creativeness or lack there of or .. too much - purple prose.
- Low quant (ie q2k, iq1s, iq2s) issues.
- General output quality.
- Role play related issues.
Likewise ALL the setting (parameters, samplers and advanced samplers) below can also improve model generation and/or general overall "smoothness" / "quality" of model operation:
- all parameters and samplers available via LLAMACPP (and most apps that run / use LLAMACPP - including Lmstudio, Ollama, Sillytavern and others.)
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in oobabooga/text-generation-webui including llamacpp_HF loader (allowing a lot more samplers)
- all parameters (including some not in Lllamacpp), samplers and advanced samplers ("Dry", "Quadratic", "Microstat") in SillyTavern / KoboldCPP (including Anti-slop filters)
Even if you are not using my models, you may find this document <u>useful for any model (any quant / full source / any repo) available online.</u>
If you are currently using model(s) - from my repo and/or others - that are difficult to "wrangle" then you can apply "Class 3" or "Class 4" settings to them.
This document will be updated over time too and is subject to change without notice.
Please use the "community tab" for suggestions / edits / improvements.
IMPORTANT:
Every parameter, sampler and advanced sampler here affects per token generation and overall generation quality.
This effect is cumulative especially with long output generation and/or multi-turn (chat, role play, COT).
Likewise because of how modern AIs/LLMs operate the previously generated (quality) of the tokens generated affect the next tokens generated too.
You will get higher quality operation overall - stronger prose, better answers, and a higher quality adventure.
---
<h2>TESTING / Generation Example PARAMETERS AND SAMPLERS</h2>
---
Primary Testing Parameters I use, including use for output generation examples at my repo:
<B>Ranged Parameters:</B>
temperature: 0 to 5 ("temp")
repetition_penalty : 1.02 to 1.15 ("rep pen")
<B>Set parameters:</B>
top_k:40
min_p:0.05
top_p: 0.95
repeat-last-n: 64 (also called: "repetition_penalty_range" / "rp range" )
I do not set any other settings, parameters or have samplers activated when generating examples.
Everything else is "zeroed" / "disabled".
These parameters/settings are considered both safe and default and in most cases available to all users in all AI/LLM apps.
Note for Class 3/Class 4 models (discussed below) "repeat-last-n" is a CRITICAL setting.
---
<h2>SOURCE FILES for my Models / APPS to Run LLMs / AIs:</h2>
---
Source files / Source models of my models are located here (also upper right menu on this page):
[ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be ]
You will need the config files to use "llamacpp_HF" loader ("text-generation-webui") [ https://github.com/oobabooga/text-generation-webui ]
You can also use the full source in "text-generation-webui" too.
As an alternative you can use GGUFs directly in "KOBOLDCPP" / "SillyTavern" without the "config files" and still use almost all the parameters, samplers and advanced samplers.
<B>Parameters, Samplers and Advanced Samplers</B>
In section 1 a,b, and c, below are all the LLAMA_CPP parameters and samplers.
I have added notes below each one for adjustment / enhancement(s) for specific use cases.
TEXT-GENERATION-WEBUI
In section 2, will be additional samplers, which become available when using "llamacpp_HF" loader in https://github.com/oobabooga/text-generation-webui
AND/OR https://github.com/LostRuins/koboldcpp ("KOBOLDCPP").
The "llamacpp_HF" (for "text-generation-webui") only requires the GGUF you want to use plus a few config files from "source repo" of the model.
(this process is automated with this program, just enter the repo(s) urls -> it will fetch everything for you)
This allows access to very advanced samplers in addition to all the parameters / samplers here.
KOBOLDCPP:
Note that https://github.com/LostRuins/koboldcpp also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
You can use almost all parameters, samplers and advanced samplers using "KOBOLDCPP" without the need to get the source config files (the "llamacpp_HF" step).
Note: This program has one of the newest samplers called "Anti-slop" which allows phrase/word banning at the generation level.
SILLYTAVERN:
Note that https://github.com/SillyTavern/SillyTavern also allows access to all LLAMACPP parameters/samplers too as well as additional advanced samplers too.
You can use almost all parameters, samplers and advanced samplers using "SILLYTAVERN" without the need to get the source config files (the "llamacpp_HF" step).
For CLASS3 and CLASS4 the most important setting is "SMOOTHING FACTOR" (Quadratic Smoothing) ; information is located on this page:
https://docs.sillytavern.app/usage/common-settings/
Critical Note:
Silly Tavern allows you to "connect" (via API) to different AI programs/apps like Koboldcpp, Llamacpp (server), Text Generation Webui, Lmstudio, Ollama ... etc etc.
You "load" a model in one of these, then connect Silly Tavern to the App via API. This way you can use any model, and Sillytavern becomes the interface between
the AI model and you directly. Sillytavern opens an interface in your browser.
In Sillytavern you can then adjust parameters, samplers and advanced samplers ; there are also PRESET parameter/samplers too and you can save your favorites too.
Currently, at time of this writing, connecting Silly Tavern via KoboldCPP or Text Generation Webui will provide the most samplers/parameters.
However for some, connecting to Lmstudio, LlamaCPP, or Ollama may be preferred.
NOTE:
It appears that Silly Tavern also supports "DRY" and "XTC" too ; but it is not yet in the documentation at the time of writing.
You may also want to check out how to connect SillyTavern to local AI "apps" running on your pc here:
https://docs.sillytavern.app/usage/api-connections/
OTHER PROGRAMS:
Other programs like https://www.LMStudio.ai allows access to most of STANDARD samplers, where as others (llamacpp only here) you may need to add to the json file(s) for a model and/or template preset.
In most cases all llama_cpp parameters/samplers are available when using API / headless / server mode in "text-generation-webui", "koboldcpp", "Sillytavern", "Olama", and "LMStudio" (as well as other apps too).
You can also use llama_cpp directly too. (IE: llama-server.exe) ; see :
https://github.com/ggerganov/llama.cpp
(scroll down on the main page for more apps/programs to use GGUFs too that connect to / use the LLAMA-CPP package.)
Special note:
It appears "DRY" / "XTC" samplers has been added to LLAMACPP and SILLYTAVERN.
It is available (Llamacpp) via "server.exe / llama-server.exe". Likely this sampler will also become available "downstream" in applications that use LLAMACPP in due time.
[ https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md ]
Operating Systems:
Most AI/LLM apps operate on Windows, Mac, and Linux.
Mobile devices (and O/S) are in many cases also supported.
---
<h2>DETAILED NOTES ON PARAMETERS, SAMPLERS and ADVANCED SAMPLERS:</h2>
---
Most AI / LLM apps allow saving a "profile" parameters and samplers - "favorite" settings.
Text Generation Web Ui, Koboldcpp, Silly Tavern all have this feature and also "presets" (parameters/samplers set already) too.
Other AI/LLM apps also have this feature to varying degrees too.
DETAILS on PARAMETERS / SAMPLERS:
For additional details on these samplers settings (including advanced ones) you may also want to check out:
https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
(NOTE: Not all of these "options" are available for GGUFS, including when you use "llamacpp_HF" loader in "text-generation-webui" )
Additional Links (on parameters, samplers and advanced samplers):
DRY
- https://github.com/oobabooga/text-generation-webui/pull/5677
- https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
- https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
Samplers:
https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
Creative Writing :
https://www.reddit.com/r/LocalLLaMA/comments/1c36ieb/comparing_sampling_techniques_for_creative/
General Parameters:
https://arxiv.org/html/2408.13586v1
Benchmarking-and-Guiding-Adaptive-Sampling-Decoding
https://github.com/ZhouYuxuanYX/Benchmarking-and-Guiding-Adaptive-Sampling-Decoding-for-LLMs
LLAMACPP-SERVER EXE:
https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
I have also added notes too in the sections below as well.
OTHER:
Depending on the AI/LLM "apps" you are using, additional reference material for parameters / samplers may also exist.
---
<h2>Class 1, 2, 3 and 4 model critical notes:</h2>
---
Some of the models at my repo are custom designed / limited use case models. For some of these models, specific settings and/or samplers (including advanced) are recommended for best operation.
As a result I have classified the models as class 1, class 2, class 3 and class 4.
Each model is "classed" on the model card itself for each model.
Generally all models (mine and other repos) fall under class 1 or class 2 and can be used when just about any sampler(s) / parameter(s) and advanced sampler(s).
Class 3 requires a little more adjustment because these models run closer to the ragged edge of stability. The settings for these will help control them better, especially
for chat / role play and/or other use case(s). Generally speaking, this helps them behave better overall.
Class 4 are balanced on the very edge of stability. These models are generally highly creative, for very narrow use case(s), and closer to "human prose" than other models and/or
operate in ways no other model(s) operate offering unique generational abilities. With these models, advanced samplers are used to "bring these bad boys" inline which is especially important for chat and/or role play type use cases AND/OR use case(s) these models were not designed for.
For reference here are some Class 3/4 models:
[ https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF ]
(note Grand Horror Series contain class 2,3 and 4 models)
[ https://huggingface.co/DavidAU/L3-DARKEST-PLANET-16.5B-GGUF ]
(note Dark Planet Series contains Class 1, 2 and Class 3/4 models)
[ https://huggingface.co/DavidAU/MN-DARKEST-UNIVERSE-29B-GGUF ]
(this model has exceptional prose abilities in all areas)
[ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]
(note Grand Guttenberg Madness/Darkess (12B) are class 1 models, but compressed versions of 23.5B)
Although Class 3 and Class 4 models will work when used within their specific use case(s), standard parameters and settings on the model card, I recognize that users want either a smoother experience
and/or want to use these models for other than intended use case(s) and that is in part why I created this document.
The goal here is to use parameters to raise/lower the power of the model and samplers to "prune" (and/or in some cases enhance) operation.
With that being said, generation "examples" (at my repo) are created using the "Primary Testing Parameters" (top of this document) settings regardless of the "class" of the model and no advanced settings, parameters, or samplers.
However, for ANY model regardless of "class" or if it is at my repo, you can now take performance to the next level with the information contained in this document.
Side note:
There are no "Class 5" models published... yet.
---
<h2>QUANTS:</h2>
---
Please note that smaller quant(s) IE: Q2K, IQ1s, IQ2s and some IQ3s (especially those of models size 8B parameters or less) may require additional adjustment(s). For these quants
you may need to increase the "penalty" sampler(s) and/or advanced sampler(s) to compensate for the compression damage of the model.
For models of 20B parameters and higher, generally this is not a major concern as the parameters can make up for compression damage at lower quant levels (IE Q2K+, but at least Q3 ; IQ2+, but at least IQ3+).
IQ1s: Generally IQ1_S rarely works for models less than 30B parameters. IQ1_M is however almost twice as stable/usable relative to IQ1_S.
Generally it is recommended to run the highest quant(s) you can on your machine ; but at least Q4KM/IQ4XS as a minimum for models 20B and lower.
The smaller the size of model, the greater the contrast between the smallest quant and largest quant in terms of operation, quality, nuance and general overall function.
There is an exception to this , see "Neo Imatrix" below and "all quants" (cpu only operation).
IMATRIX:
Imatrix quants generally improve all quants, and also allow you to use smaller quants (less memory, more context space) and retain quality of operation.
IE: Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.
<B>Recommended Quants - ALL:</B>
This covers both Imatrix and regular quants.
Imatrix can be applied to any quant - "Q" or "IQ" - however, IQ1s to IQ3_S REQUIRE an imatrix dataset / imatrixing process before quanting.
This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
<small>
<PRE>
IQ1_S | IQ1_M
IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
Q5_K_S | Q5_K_M
Q6_K
Q8_0
F16
</pre>
</small>
More BPW mean better quality, but higher VRAM requirements (and larger file size) and lower tokens per second.
The larger the model in terms of parameters the lower the size of quant you can run with less quality losses.
Note that "quality losses" refers to both instruction following and output quality.
Differences (quality) between quants at lower levels are larger relative to higher quants differences.
The Imatrix process has NO effect on Q8 or F16 quants.
F16 is full precision, just in GGUF format.
CPU ONLY CONSIDERATIONS:
This section DOES NOT apply to most "Macs" because of the difference in O/S Memory, Vram and motherboard VS other frameworks.
Running quants on CPU will be a lot slower than running them on a video card(s).
In this special case however it may be preferred to run AS SMALL a quant as possible for token per second generation reasons.
On a top, high end (and relatively new) CPU expect token per second speeds to be 1/4 (or less) a standard middle of the road video card.
Older machines/cpus will be a lot slower - but models will STILL run on these as long as you have enough ram.
Here are some rough comparisons:
On my video card (Nvidia 16GB 4060TI) I get 160-190 tokens per second with 1B LLama 3.2 Instruct, CPU speeds are 50-60 token per second.
On my much older machine (8 years old)(2 core), token per second speed (same 1B model) is in the 10ish token per second (CPU).
Roughly 8B-12B models are limit for CPU only operation (in terms of "usable" tokens/second) - at the moment.
This is changing as new cpus come out, designed for AI usage.
ARM QUANTS:
These are new quants that are specifically for computers/devices that can run "ARM" quants. If you try to run these on a "non arm" machine/device, the token per second will be VERY SLOW.
<B>NEO Imatrix Quants / Neo Imatrix X Quants</B>
NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets
are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets,
and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.
Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
Here are some Imatrix Neo Models:
[ https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-DARK-HORROR-V1-V2-35B-IMATRIX-GGUF ]
[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ]
[ https://huggingface.co/DavidAU/Command-R-01-200xq-Ultra-NEO-V1-35B-IMATRIX-GGUF ] (this is an X-Quant)
[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-SI-FI-GGUF ]
[ https://huggingface.co/DavidAU/Llama-3.2-1B-Instruct-NEO-WEE-HORROR-GGUF ]
[ https://huggingface.co/DavidAU/L3-8B-Stheno-v3.2-Ultra-NEO-V1-IMATRIX-GGUF ]
Suggestions for Imatrix NEO quants:
- The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the "tint" so to speak
- Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum effect with IQ4_XS the most balanced in terms of power and bits.
- Secondaries are Q2s-Q4s. Imatrix effect is still strong in these quants.
- Effects diminish quickly from Q5s and up.
- Q8/F16 there is no change (as the Imatrix process does not affect this quant), and therefore not included.
---
<h2> Quick Reference Table </h2>
---
Compiled by: "EnragedAntelope"
https://huggingface.co/EnragedAntelope
https://github.com/EnragedAntelope
This section will get you started - especially with class 3 and 4 models - and the detail section will cover settings / control in more depth below.
Please see sections below this for advanced usage, more details, settings, notes etc etc.
<small>
# LLM Parameters Reference Table
| Parameter | Description |
|----------- |-------------|
| **Primary Parameters** |
| temperature | Controls randomness of outputs (0 = deterministic, higher = more random). Range: 0-5 |
| top-p | Selects tokens with probabilities adding up to this number. Higher = more random results. Default: 0.9 |
| min-p | Discards tokens with probability smaller than this value × probability of most likely token. Default: 0.1 |
| top-k | Selects only top K most likely tokens. Higher = more possible results. Default: 40 |
| **Penalty Samplers** |
| repeat-last-n | Number of tokens to consider for penalties. Critical for preventing repetition. Default: 64 (Class 3/4 - but see notes) |
| repeat-penalty | Penalizes repeated token sequences. Range: 1.0-1.15. Default: 1.0 |
| presence-penalty | Penalizes token presence in previous text. Range: 0-0.2 for Class 3, 0.1-0.35 for Class 4 |
| frequency-penalty | Penalizes token frequency in previous text. Range: 0-0.25 for Class 3, 0.4-0.8 for Class 4 |
| penalize-nl | Penalizes newline tokens. Generally unused. Default: false |
| **Secondary Samplers** |
| mirostat | Controls perplexity during sampling. Modes: 0 (off), 1, or 2 |
| mirostat-lr | Mirostat learning rate. Default: 0.1 |
| mirostat-ent | Mirostat target entropy. Default: 5.0 |
| dynatemp-range | Range for dynamic temperature adjustment. Default: 0.0 |
| dynatemp-exp | Exponent for dynamic temperature scaling. Default: 1.0 |
| tfs | Tail free sampling - removes low-probability tokens. Default: 1.0 |
| typical | Selects tokens more likely than random given prior text. Default: 1.0 |
| xtc-probability | Probability of token removal. Range: 0-1 |
| xtc-threshold | Threshold for considering token removal. Default: 0.1 |
| **Advanced Samplers** |
| dry_multiplier | Controls DRY (Don't Repeat Yourself) intensity. Range: 0.8-1.12+ |
| dry_allowed_length | Allowed length for repeated sequences in DRY. Default: 2 |
| dry_base | Base value for DRY calculations. Range: 1.15-1.75+ for Class 4 |
| smoothing_factor | Quadratic sampling intensity. Range: 1-3 for Class 3, 3-5+ for Class 4 |
| smoothing_curve | Quadratic sampling curve. Range: 1 for Class 3, 1.5-2 for Class 4 |
## Notes
- For Class 3 and 4 models, using both DRY and Quadratic sampling is recommended
- Lower quants (Q2K, IQ1s, IQ2s) may require stronger settings due to compression damage
- Parameters interact with each other, so test changes one at a time
- Always test with temperature at 0 first to establish a baseline
</small>
---
<h2>ADVANCED: HOW TO TEST EACH PARAMETER(s), SAMPLER(s) and ADVANCED SAMPLER(s)</h2>
---
1 - Set temp to 0 (zero) and set your basic parameters, and use a prompt to get a "default" generation. A creative prompt will work better here.
2 - If you want to test basic parameter changes, test ONE at a time, then compare output (answer quality, word choice, sentence size/construction, general output qualities) to your "default" generation.
3 - Then start testing TWO parameters at a time, and comparing again. Keep in mind parameters (all) interact with each other.
4 - Samplers -> Reset your basic parameters, (temp still at zero) and test each one of these, one at a time. Then adjust settings, test again.
5 - Once you have an "idea" of how each affects your "test prompt" , now test at "temp" (not zero). It may take five to ten generation to get a rough idea.
Yes, testing is a lot of work - but once you get all the parameter(s) and/or sampler(s) dialed in - it is worth it.
IMPORTANT: Use a "fresh chat" PER TEST (you will contaminate the results otherwise). Never use the same chat for multiple tests -> exception: Regens.
Keep in mind that parameters, samplers and advanced samplers can affect the model on a per token generation basis AND/OR on a multi-token / phrase / sentence / paragraph
and even complete generation basis.
Everything is cumulative here regardless if the parameter/sampler affects per token or multi-token basis because of how models "look back" to see what was generated in some cases.
And of course... each model will be different too.
All that being said, it is a good idea to have specific generation quality "goals" in mind.
Likewise, at my repo, I post example generations so you can get an idea (but not complete picture) of a model's generation abilities.
The best way to control generation is STILL with your prompt(s) - including pre-prompts/system role. The latest gen models (and archs) have very strong
instruction following so many times better (or just included!) instructions in your prompts can make a world of difference.
Not sure if the model understands your prompt(s)?
Ask it ->
"Check my prompt below and tell me how to make it clearer?" (prompt after this line)
"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)
This will help the model fine tune your prompt so IT understands it.
However sometimes parameters and/or samplers are required to better "wrangle" the model and getting to perform to its maximum potential and/or fine tune it to your use case(s).
---
<h2>Section 1a : PRIMARY PARAMETERS - ALL APPS:</h2>
---
These parameters will have SIGNIFICANT effect on prose, generation, length and content; with temp being the most powerful.
Keep in mind the biggest parameter / random "unknown" is your prompt.
A word change, rephrasing, punctation , even a comma, or semi-colon can drastically alter the output, even at min temp settings. CAPS also affect generation too.
Likewise the size, and complexity of your prompt impacts generation too ; especially clarity and direction.
Special note:
Pre-prompts / system role are not discussed here. Many of the model repo cards (at my repo) have an optional pre-prompt you can use to aid generation (and can impact instruction following too).
Some of my newer models repo cards use a limited form of this called a "prose control" (discussed and shown by example).
Roughly a pre-prompt / system role is embedded during each prompt and can act as a guide and/or set of directives for processing the prompt and/or containing generation instructions.
A prose control is a simplifed version of this, which preceeds the main prompt(s) - but the idea / effect is relatively the same (pre-prompt/system role does have a slightly higher priority however).
I strongly suggest you research these online, as they are a powerful addition to your generation toolbox.
They are especially potent with newer model archs due to newer model types having stronger instruction following ablities AND increase context too.
---
<B>PRIMARY PARAMETERS:</B>
---
<B>temp / temperature</B>
temperature (default: 0.8)
Primary factor to control the randomness of outputs. 0 = deterministic (only the most likely token is used). Higher value = more randomness.
Range 0 to 5. Increment at .1 per change.
Too much temp can affect instruction following in some cases and sometimes not enough = boring generation.
Newer model archs (L3,L3.1,L3.2, Mistral Nemo, Gemma2 etc) many times NEED more temp (1+) to get their best generations.
<B>top-p</B>
top-p sampling (default: 0.9, 1.0 = disabled)
If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
Dropping this can simplify word choices but this works in conjunction with "top-k"
I use default of: .95 ;
<B>min-p</B>
min-p sampling (default: 0.1, 0.0 = disabled)
Tokens with probability smaller than (min_p) * (probability of the most likely token) are discarded.
I use default: .05 ;
Careful adjustment of this parameter can result in more "wordy" or "less wordy" generation but this works in conjunction with "top-k".
<B>top-k</B>
top-k sampling (default: 40, 0 = disabled)
Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
Bring this up to 80-120 for a lot more word choice, and below 40 for simpler word choices.
As this parameter operates in conjunction with "top-p" and "min-p" all three should be carefully adjusted one at a time.
<B>NOTE - "CORE" Testing with "TEMP":</B>
For an interesting test, set "temp" to 0 ; this will give you the SAME generation for a given prompt each time.
Then adjust a word, phrase, sentence etc in your prompt, and generate again to see the differences.
(you should use a "fresh" chat for each generation)
Keep in mind this will show model operation at its LEAST powerful/creative level and should NOT be used to determine if the model works for your use case(s).
Then test your prompt(s) "at temp" to see the model in action. (5-10 generations recommended)
You can also use "temp=0" to test different quants of the same model to see generation differences. (roughly minor "BIAS" changes which reflect math changes due to compress/mixtures differences between quants).
Another option is testing different models (at temp=0 AND of the same quant) to see how each handles your prompt(s).
Then test "at temp" with your prompt(s) to see the MODELS in action. (5-10 generations recommended)
---
<h2>Section 1b : PENALITY SAMPLERS - ALL APPS:</h2>
---
These samplers "trim" or "prune" output in real time.
The longer the generation, the stronger overall effect but that all depends on "repeat-last-n" setting.
For creative use cases, these samplers can alter prose generation in interesting ways.
Penalty parameters affect both per token and part of OR entire generation (depending on settings / output length).
CLASS 4: For these models it is important to activate / set all samplers as noted for maximum quality and control.
<B>PRIMARY:</B>
<B>repeat-last-n</B>
last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)
("repetition_penalty_range" in oobabooga/text-generation-webui , "rp_range" in kobold)
THIS IS CRITICAL.
Too high you can get all kinds of issues (repeat words, sentences, paragraphs or "gibberish"), especially with class 3 or 4 models.
Likewise if you change this parameter it will drastically alter the output.
This setting also works in conjunction with all other "rep pens" below.
This parameter is the "RANGE" of tokens looked at for the samplers directly below.
<B>SECONDARIES:</B>
<B>repeat-penalty</B>
penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)
(commonly called "rep pen")
Generally this is set from 1.0 to 1.15 ; smallest increments are best IE: 1.01... 1,.02 or even 1.001... 1.002.
This affects creativity of the model over all, not just how words are penalized.
<B>presence-penalty</B>
repeat alpha presence penalty (default: 0.0, 0.0 = disabled)
Generally leave this at zero IF repeat-last-n is 512-1024 or less. You may want to use this for higher repeat-last-n settings.
CLASS 3: 0.05 to .2 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
CLASS 4: 0.1 to 0.35 may assist generation BUT SET "repeat-last-n" to 64.
<B>frequency-penalty</B>
repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)
Generally leave this at zero IF repeat-last-n is 512 or less. You may want to use this for higher repeat-last-n settings.
CLASS 3: 0.25 may assist generation BUT SET "repeat-last-n" to 512 or less. Better is 128 or 64.
CLASS 4: 0.4 to 0.8 may assist generation BUT SET "repeat-last-n" to 64.
<B>penalize-nl </B>
penalize newline tokens (default: false)
Generally this is not used.
---
<h2>Section 1c : SECONDARY SAMPLERS / FILTERS - ALL APPS:</h2>
---
In some AI/LLM apps, these may only be available via JSON file modification and/or API.
For "text-gen-webui" and "Koboldcpp" these are directly accessible (and via Sillytavern IF you use either of these APPS to connect Silly Tavern to their API).
<B>i) OVERALL GENERATION CHANGES (affect per token as well as over all generation):</B>
<B>mirostat</B>
Use Mirostat sampling. "Top K", "Nucleus", "Tail Free" (TFS) and "Locally Typical" (TYPICAL) samplers are ignored if used. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)
"mirostat-lr"
Mirostat learning rate, parameter eta (default: 0.1) " mirostat_tau "
mirostat_tau: 5-8 is a good value.
"mirostat-ent"
Mirostat target entropy, parameter tau (default: 5.0) " mirostat_eta "
mirostat_eta: 0.1 is a good value.
Activates the Mirostat sampling technique. It aims to control perplexity during sampling. See the paper. ( https://arxiv.org/abs/2007.14966 )
This is the big one ; activating this will help with creative generation. It can also help with stability. Also note which
samplers are disabled/ignored here, and that "mirostat_eta" is a learning rate.
This is both a sampler (and pruner) and enhancement all in one.
It also has two modes of generation "1" and "2" - test both with 5-10 generations of the same prompt. Make adjustments, and repeat.
CLASS 3: models it is suggested to use this to assist with generation (min settings).
CLASS 4: models it is highly recommended with Microstat 1 or 2 + mirostat_tau @ 6 to 8 and mirostat_eta at .1 to .5
<b>Dynamic Temperature</b>
"dynatemp-range "
dynamic temperature range (default: 0.0, 0.0 = disabled)
"dynatemp-exp"
dynamic temperature exponent (default: 1.0)
In: oobabooga/text-generation-webui (has on/off, and high / low) :
Activates Dynamic Temperature. This modifies temperature to range between "dynatemp_low" (minimum) and "dynatemp_high" (maximum), with an entropy-based scaling. The steepness of the curve is controlled by "dynatemp_exponent".
This allows the model to CHANGE temp during generation. This can greatly affect creativity, dialog, and other contrasts.
For Koboldcpp a converter is available and in oobabooga/text-generation-webui you just enter low/high/exp.
CLASS 4 only: Suggested this is on, with a high/low of .8 to 1.8 (note the range here of "1" between high and low); with exponent to 1 (however below 0 or above work too)
To set manually (IE: Api, lmstudio, Llamacpp, etc) using "range" and "exp" ; this is a bit more tricky: (example is to set range from .8 to 1.8)
1 - Set the "temp" to 1.3 (the regular temp parameter)
2 - Set the "range" to .500 (this gives you ".8" to "1.8" with "1.3" as the "base")
3 - Set exp to 1 (or as you want).
This is both an enhancement and in some ways fixes issues in a model when too little temp (or too much/too much of the same) affects generation.
<B> ii) PER TOKEN CHANGES:</B>
<B>tfs</B>
Tail free sampling, parameter z (default: 1.0, 1.0 = disabled)
Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. The closer to 0, the more discarded tokens.
( https://www.trentonbricken.com/Tail-Free-Sampling/ )
<B>typical</B>
Locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)
If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
<B> XTC</B>
"xtc-probability"
xtc probability (default: 0.0, 0.0 = disabled)
Probability that the removal will actually happen. 0 disables the sampler. 1 makes it always happen.
"xtc-threshold"
xtc threshold (default: 0.1, 1.0 = disabled)
If 2 or more tokens have probability above this threshold, consider removing all but the last one.
XTC is a new sampler, that adds an interesting twist in generation.
Suggest you experiment with this one, with other advanced samplers disabled to see its affects.
<B>l, logit-bias TOKEN_ID(+/-)BIAS </B>
modifies the likelihood of token appearing in the completion,
i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello', or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'
This may or may not be available. This requires a bit more work.
Note: +- range is 0 to 100.
IN "oobabooga/text-generation-webui" there is "TOKEN BANNING":
This is a very powerful pruning method; which can drastically alter output generation.
I suggest you get some "bad outputs" ; get the "tokens" (actual number for the "word" / part word) then use this.
Careful testing is required, as this can have unclear side effects.
---
<h2>SECTION 2: ADVANCED SAMPLERS - "text-generation-webui" / "KOBOLDCPP" / "SillyTavern" (see note 1 below): </h2>
<B>Additional Parameters / Samplers, including "DRY", "QUADRATIC" and "ANTI-SLOP".</B>
---
Note #1 :
You can use these samplers via Sillytavern IF you use either of these APPS (Koboldcpp/Text Generation Webui) to connect Silly Tavern to their API.
Other Notes:
Hopefully ALL these samplers / controls will be LLAMACPP and available to all users via AI/LLM apps soon.
"DRY" sampler has been added to Llamacpp as of the time of this writing (and available via SERVER/LLAMA-SERVER.EXE) and MAY appear in other "downstream" apps that use Llamacpp.
INFORMATION ON THESE SAMPLERS:
For more info on what they do / how they affect generation see:
https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
(also see the section above "Additional Links" for more info on the parameters/samplers)
ADVANCED SAMPLERS - PART 1:
Keep in mind these parameters/samplers become available (for GGUFs) in "oobabooga/text-generation-webui" when you use the llamacpp_HF loader.
Most of these are also available in KOBOLDCPP too (via settings -> samplers) after start up (no "llamacpp_HF loader" step required).
I am not going to touch on all of samplers / parameters, just the main ones at the moment.
However, you should also check / test operation of (these are in Text Generation WebUI, and may be available via API / In Sillytavern (when connected to Text Generation Webui)):
a] Affects per token generation:
- top_a
- epsilon_cutoff - see note 4
- eta_cutoff - see note 4
- no_repeat_ngram_size - see note #1.
b] Affects generation including phrase, sentence, paragraph and entire generation:
- no_repeat_ngram_size - see note #1.
- encoder_repetition_penalty "Hallucinations filter" - see note #2.
- guidance_scale (with "Negative prompt" ) => this is like a pre-prompt/system role prompt - see note #3.
- Disabling (BOS TOKEN) this can make the replies more creative.
- Custom stopping strings
Note 1:
"no_repeat_ngram_size" appears in both because it can impact per token OR per phrase depending on settings. This can also drastically affect sentence,
paragraph and general flow of the output.
Note 2:
This parameter if set to LESS than 1 causing the model to "jump" around a lot more , whereas above 1 causes the model to focus more on the immediate surroundings.
If the model is crafting a "scene", a setting of less than 1 causes the model to jump around the room, outside, etc etc ; if less than 1 then it focuses the model more on
the moment, the immediate surroundings, the POV character and details in the setting.
Note 3:
This is a powerful method to send instructions / directives to the model on how to process your prompt(s) each time. See [ https://arxiv.org/pdf/2306.17806 ]
Note 4:
These control selection of tokens, in some case providing more relevant and/or more options. See [ https://arxiv.org/pdf/2210.15191 ]
<B>MAIN ADVANCED SAMPLERS PART 2 (affects per token AND overall generation): </B>
What I will touch on here are special settings for CLASS 3 and CLASS 4 models (for the first TWO samplers).
For CLASS 3 you can use one, two or both.
For CLASS 4 using BOTH are strongly recommended, or at minimum "QUADRATIC SAMPLING".
These samplers (along with "penalty" settings) work in conjunction to "wrangle" the model / control it and get it to settle down, important for Class 3 but critical for Class 4 models.
For other classes of models, these advanced samplers can enhance operation across the board.
For Class 3 and Class 4 the goal is to use the LOWEST settings to keep the model inline rather than "over prune it".
You may therefore want to experiment to with dropping the settings (SLOWLY) for Class3/4 models from suggested below.
<B>DRY:</B>
Dry ("Don't Repeat Yourself") affects repetition (and repeat "penalty") at the word, phrase, sentence and even paragraph level. Read about "DRY" above, in the "Additional Links" links section above.
Class 3:
dry_multiplier: .8
dry_allowed_length: 2
dry_base: 1
Class 4:
dry_multiplier: .8 to 1.12+
dry_allowed_length: 2 (or less)
dry_base: 1.15 to 1.75+
Dial the "dry_muliplier" up or down to "reign in" or "release the madness" so to speak from the core model.
For Class 4 models this is used to control some of the model's bad habit(s).
For more information on "DRY":
https://github.com/oobabooga/text-generation-webui/pull/5677
https://www.reddit.com/r/KoboldAI/comments/1e49vpt/dry_sampler_questionsthat_im_sure_most_of_us_are/
https://www.reddit.com/r/KoboldAI/comments/1eo4r6q/dry_settings_questions/
<B>QUADRATIC SAMPLING: AKA "Smoothing"</B>
This sampler alters the "score" of ALL TOKENS at the time of generation and as a result affects the entire generation of the model. See "Additional Links" links section above for more information.
Class 3:
smoothing_factor: 1 to 3
smoothing_curve: 1
Class 4:
smoothing_factor: 3 to 5 (or higher)
smoothing_curve: 1.5 to 2.
Dial the "smoothing factor" up or down to "reign in" or "release the madness" so to speak.
In Class 3 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model.
In Class 4 models, this has the effect of modifying the prose closer to "normal" with as much or little (or a lot!) touch of "madness" from the root model AND wrangling in some of the core model's bad habits.
For more information on Quadratic Samplings:
https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e632498f73e
<B>ANTI-SLOP - Kolbaldcpp only</B>
Hopefully this powerful sampler will soon appear in all LLM/AI apps.
You can access this in the KoboldCPP app, under "context" -> "tokens" on the main page of the app after start up.
This sampler allows banning words and phrases DURING generation, forcing the model to "make another choice".
This is a game changer in custom real time control of the model.
For more information on ANTI SLOP project (owner runs EQBench):
https://github.com/sam-paech/antislop-sampler
FINAL NOTES:
Keep in mind that these settings/samplers work in conjunction with "penalties" ; which is especially important
for operation of CLASS 4 models for chat / role play and/or "smoother operation".
For Class 3 models, "QUADRATIC" will have a slightly stronger effect than "DRY" relatively speaking.
If you use Microstat sampler, keep in mind this will interact with these two advanced samplers too.
And...
Smaller quants may require STRONGER settings (all classes of models) due to compression damage, especially for Q2K, and IQ1/IQ2s.
This is also influenced by the parameter size of the model in relation to the quant size.
IE: a 8B model at Q2K will be far more unstable relative to a 20B model at Q2K, and as a result require stronger settings.
|