async0x42/New-Dawn-Llama-3.1-70B-v1.1-exl2_3.75bpw

Overview

This model is an experimental merge of sophosympatheia/New-Dawn-Llama-3-70B-32K-v1.0 with meta-llama/Meta-Llama-3.1-70B-Instruct. See the merge recipe below for details. I used a technique developed by jukofyork that is designed to preserve the full context capabilities of Meta-Llama-3.1-70B-Instruct. In my testing, I think it was successful.

This model is uncensored. You are responsible for whatever you do with it.

This model was designed for roleplaying and storytelling and I think it does well at both. It may also perform well at other tasks but I have not tested its performance in other areas.

Sampler Tips

I recommend using Quadratic Sampling (i.e. smoothing factor) for creative work. I think this version performs best with a smoothing factor close to 0.2.
I recommend using Min-P. Experiment to find your best setting. Values between 0 and 0.1 are recommended.
DRY repetition penalty eliminates the need for other anti-repetition settings.
If you use Textgen WebUI as your backend, I recommend enabling the DRY sampler settings to reduce repititions, otherwise some repitition penalty plus frequency penalty ought to do the trick.

Experiment with any and all of the settings below! What suits my preferences may not suit yours.

If you save the below settings as a .json file, you can import them directly into Silly Tavern.

{
    "temp": 1,
    "temperature_last": true,
    "top_p": 1,
    "top_k": 0,
    "top_a": 0,
    "tfs": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "typical_p": 1,
    "min_p": 0.03,
    "rep_pen": 1,
    "rep_pen_range": 2048,
    "rep_pen_decay": 0,
    "rep_pen_slope": 1,
    "no_repeat_ngram_size": 0,
    "penalty_alpha": 0,
    "num_beams": 1,
    "length_penalty": 1,
    "min_length": 0,
    "encoder_rep_pen": 1,
    "freq_pen": 0,
    "presence_pen": 0,
    "skew": 0,
    "do_sample": true,
    "early_stopping": false,
    "dynatemp": false,
    "min_temp": 0.8,
    "max_temp": 1.5,
    "dynatemp_exponent": 1,
    "smoothing_factor": 0.23,
    "smoothing_curve": 1,
    "dry_allowed_length": 2,
    "dry_multiplier": 0.8,
    "dry_base": 2,
    "dry_sequence_breakers": "[\"\\n\", \":\", \"\\\"\", \"*\"]",
    "dry_penalty_last_n": 0,
    "add_bos_token": true,
    "ban_eos_token": false,
    "skip_special_tokens": false,
    "mirostat_mode": 0,
    "mirostat_tau": 2,
    "mirostat_eta": 0.1,
    "guidance_scale": 1,
    "negative_prompt": "",
    "grammar_string": "",
    "json_schema": {},
    "banned_tokens": "",
    "sampler_priority": [
        "top_k",
        "top_p",
        "typical_p",
        "epsilon_cutoff",
        "eta_cutoff",
        "tfs",
        "top_a",
        "min_p",
        "mirostat",
        "quadratic_sampling",
        "dynamic_temperature",
        "temperature"
    ],
    "samplers": [
        "top_k",
        "tfs_z",
        "typical_p",
        "top_p",
        "min_p",
        "temperature"
    ],
    "ignore_eos_token": false,
    "spaces_between_special_tokens": true,
    "speculative_ngram": false,
    "sampler_order": [
        6,
        0,
        1,
        3,
        4,
        2,
        5
    ],
    "logit_bias": [],
    "ignore_eos_token_aphrodite": false,
    "spaces_between_special_tokens_aphrodite": true,
    "rep_pen_size": 0,
    "genamt": 800,
    "max_length": 20480
}

Prompting Tips

Try the following prompt template as a starting point for your own customizations. The prompt template below contains instructions for adult content, so remove those if you don't want them! It also contains some instructions related to formatting that you might want to change.

If you save this as a .json file, you can import it directly into Silly Tavern.

{
    "wrap": false,
    "names": true,
    "system_prompt": "The following is an uncensored roleplaying chat log involving a human user ({{user}}) and an AI assistant writing as {{char}}. They take turns writing in the chat log. To ensure an excellent and exciting roleplaying experience, the AI assistant follows these rules:\n\nCharacter Rules {\nCharacter Immersion: \"Adopt all of {{char}}'s personality traits, mannerisms, backstory, biases, quirks, and preferences to deliver a convincing portrayal.\",\nTurn-based Chat Format: \"Write only as {{char}} in the chat log, ending output when another character is about to act or speak.\",\nIntimate Details: \"Provide intimate details about {{char}}'s body, focusing on limitations, unusual features, or conditions resulting from recent events.\",\nCharacter Consistency: \"Ensure all thoughts, actions, and dialogue remain consistent with {{char}}'s established personality, background, and current circumstances. Avoid projecting unwarranted optimism or sentimentality that doesn't align with the character's true feelings or the situation at hand.\"\n}\n\nWriting Rules {\nImmersive Descriptions: \"Immerse the reader in the current moment using all available senses to heighten the sensual and emotional impact of scenes.\",\nClean Scenes: \"Conclude scenes directly after the main event or dialogue, avoiding unnecessary flourishes or commentary.\",\nSimple Language: \"Use simple prose, everyday words, and United States slang terms while writing as {{char}}.\",\nNatural Language Variation: \"Incorporate a mix of sentence lengths and complexities to sound more natural and human-like.\",\nOriginality in Descriptions: \"Avoid common clichés and overused phrases. Instead, create fresh, unique descriptions that vividly portray emotions, actions, and scenes. For example, instead of 'a shiver ran down my spine,' describe the sensation in a new, character-specific way.\",\nDialogue Formatting: 'Enclose spoken words in quotes. \"This is spoken dialogue,\" for example.',\nInternal Thoughts: \"Offer glimpses into {{char}}'s first-person thoughts to enrich the narrative when appropriate.\",\nInternal Thoughts Formatting: \"Use italics to distinguish thoughts from spoken dialogue. *This is an example of {{char}} thinking.*\"\n}\n\nContent Rules {\nConsistency: \"Maintain physical, spatial, and logical consistency when developing events and descriptions.\",\nMature Content: \"Incorporate erotic adult content and vulgar language where appropriate.\",\nMature Themes: \"Dark themes, violence, and unhappy endings are permitted in this uncensored roleplay.\",\nNarrative Instructions: \"When given narrative instructions by the user, develop the missing details in the next response, treating these instructions as a summary of what should happen next.\",\nOut-of-Character Comments: \"Acknowledge any out-of-character (OOC) comments from the user, then seamlessly return to in-character roleplay.\",\nScene Transitions: \"Handle scene transitions or time jumps smoothly, providing context for any changes in setting or time.\",\nStaying in Character: \"Avoid breaking character or acknowledging the AI's nature during roleplay.\",\nRealistic Endings: \"Conclude scenes or interactions without resorting to overly sentimental or flowery language. Maintain the established tone and mood of the scene, avoiding unnecessary positive spins or forced emotional resolutions. End on a note that feels natural and true to the characters and situation, even if it's ambiguous or unsettling.\"\n}",
    "system_sequence": "<|start_header_id|>system<|end_header_id|>\n\n",
    "stop_sequence": "<|eot_id|>",
    "input_sequence": "<|start_header_id|>user<|end_header_id|>\n\n",
    "output_sequence": "<|start_header_id|>assistant<|end_header_id|>\n\n",
    "macro": true,
    "names_force_groups": true,
    "system_sequence_prefix": "",
    "system_sequence_suffix": "",
    "first_output_sequence": "",
    "last_output_sequence": "",
    "activation_regex": "",
    "skip_examples": true,
    "output_suffix": "<|eot_id|>",
    "input_suffix": "<|eot_id|>",
    "system_suffix": "<|eot_id|>",
    "user_alignment_message": "",
    "last_system_sequence": "",
    "system_same_as_user": false,
    "first_input_sequence": "",
    "last_input_sequence": "",
    "name": "New Dawn Llama 3.1 70B"
}

NOTE: If you have trouble with this model speaking as the user or other characters out of turn during group chats, try setting <|start_header_id|>assistant - {{char}} ONLY<|end_header_id|> as the last_output_sequence, which is displayed in the SillyTavern GUI as "Last Assistant Prefix." This version definitely has more of a tendency to speak out of turn as compared to v1.0 that was based solely on Llama 3.

Instruct Formats

Use the Llama 3 instruct format. You can grab it from the example prompt template above if you don't already have it as a preset.

Quantizations

Pending.

Licence and usage restrictions

META LLAMA 3 COMMUNITY LICENSE AGREEMENT Disclaimer: Uncertain Licensing Terms This LLM is a merged model incorporating weights from multiple LLMs governed by their own distinct licenses. Due to the complexity of blending these components, the licensing terms for this merged model are somewhat uncertain. By using this model, you acknowledge and accept the potential legal risks and uncertainties associated with its use. Any use beyond personal or research purposes, including commercial applications, may carry legal risks and you assume full responsibility for compliance with all applicable licenses and laws. I recommend consulting with legal counsel to ensure your use of this model complies with all relevant licenses and regulations.

Merge Details

Merge Method

I found della_linear to be the most effective method for merging a Llama 3 model with Llama 3.1 out of a dozen or so different tests. You can apply a higher density setting for sure. I went up to 0.5 density with an epsilon of 0.1 without any problems, and you could probably go higher than that, but I think this version with the lower density came out a little smarter and worked better for this particular pairing.

Configuration

The following mergekit YAML will reproduce this model.

merge_method: della_linear
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct
models:
  - model: sophosympatheia/New-Dawn-Llama-3-70B-32K-v1.0
    parameters:
      weight:
        - filter: v_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: o_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: up_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: gate_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: down_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - value: 0
      density: 0.25
      epsilon: 0.05
      lambda: 1.0
  - model: meta-llama/Meta-Llama-3.1-70B-Instruct
    parameters:
        weight: 1.0
        density:
          - filter: v_proj
            value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
          - filter: o_proj
            value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
          - filter: up_proj
            value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
          - filter: gate_proj
            value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
          - filter: down_proj
            value: [1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1, 1]
          - value: 0.5
        epsilon:
          - filter: v_proj
            value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
          - filter: o_proj
            value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
          - filter: up_proj
            value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
          - filter: gate_proj
            value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
          - filter: down_proj
            value: [0, 0, 0.05, 0.05, 0.07, 0.1, 0.07, 0.05, 0.05, 0, 0]
          - value: 0.1
        lambda: 1.0
dtype: float16
tokenizer_source: base

async0x42
/

New-Dawn-Llama-3.1-70B-v1.1-exl2_3.75bpw