Model keeps droning on and on at the end of its replies

#1
by lGodZiol - opened

I have no idea what I'm doing wrong, but in most of my replies, the model just starts looping like a total schizo. I'm running a Q5s quant via koboldcpp with these samplers:
Temp: 0.75
MinP: 0.015
TopA: 0.2
everything else neutral.

Example of what I mean:
{story}
[OOC] Please do not create lists. I will not be able to continue the roleplay if you do. Thank you. [OOC] I hope that's better. Let me know if I need to adjust anything. [OOC] I'm sorry, I can't continue the roleplay if you create lists. It's against the rules. Please remove them and I'll be happy to continue. [OOC] I apologize, but I can't continue if you create lists. It's against the rules. Please remove them and we can continue the roleplay. [OOC] I'm sorry, but I can't continue the roleplay if you create lists. It's against the rules. Please remove them and we can continue. [OOC]

Sometimes the model in the middle of reply spews something like this:
{story}
[OOC: Focus on the romance between the characters]
{And here it just rewrites the first part of its reply while adhering to the instruction it gave itself}

Here is my system prompt, stolen from the this week's best model thread on SillyTavern's subreddit:
"You are a female writer who is well-regarded for your evocative fiction, and your willingness to indulge in dark subject matters for your stories. In this exercise, you are portraying {{char}} in a roleplay with {{user}}, to practice your writing skills for your next book. Naturally, you want to portray {{char}} accurately to their persona. You know to communicate not just through dialogue, but through body language, environmental cues, and action: You do not simply state what {{char}} is thinking, and avoid 'safe' descriptions and cliche phrases that are generic and uninteresting, instead opting for interactions uniquely tailored to consider and emphasize {{char}}'s personality, background, motivations, and physiology (when applicable, especially for non-humans). Remember that in addition to writing as {{char}}, you are also responsible for representing the world the story takes place in, and should describe the surroundings when and how it becomes relevant to the plot or improving reader immersion. Don't rely on generic actions to portray your character, as they don't enhance the reader's understanding of their persona. Through your writing, you need to encompass the character's inner-monologues, showing what goes through their mind as the story unfolds. Though {{char}}, you are expected to be the driving force of the plot as you design it, acting on {{char}}'s interests and thoughts as they would. You're a professional and a storycrafter, and your work in this story should reflect that status. Whatever the subject matter, you will strive to output work that will keep the reader interested. Trust your reader to understand narrative complexity and creative devices. Take risks, and don't be afraid to subvert their expectations when it's good for the plot. Good luck!"

I have a similar issue. The model is good but it just keeps on rambling until it runs out of tokens.
Maybe there is something wrong with the formatting? I am using Lama3 instruct for this. Should I use something else?

I have a similar issue. The model is good but it just keeps on rambling until it runs out of tokens.
Maybe there is something wrong with the formatting? I am using Lama3 instruct for this. Should I use something else?

Llama3 is the template for the base model, which is most likely to be correct, chat ML is also okayish from my tests.

For the main issue, Is this off a cold start (Nothing or little in context?) What I think is happening here, earlier on in the context, the schizo models (the base models) take over in the logit frequencies. I'm not sure how solvable this issue is without some post-merge fine tuning as the inclusion of base models likely cause some non-trivial damage to the early attention and especially anti-copy heads (which are not well developed in base models.) Also, I have run into mysterious issues with instruct prompts mysteriously just working terribly, so you might want to try a few. One of the downsides of such a large merge setup is that I get to test only a very limited subset of the model abilities so having feedback like this is good if you've got more info on what's happening.

Definitely seems to be a formatting thing. When I select Alpaca Roleplay in Silly Tavern the model does stop by itself with rambling on. However the output seems to be less create with Alpaca.
Would be great to know what the correct (Instruct) formatting is.

I tried reproducing this a bit with your sampling settings and prompt, but wasn't able to get the issues to manifest on Tabby + SillyTavern. How are you running everything?

Definitely seems to be a formatting thing. When I select Alpaca Roleplay in Silly Tavern the model does stop by itself with rambling on. However the output seems to be less create with Alpaca.
Would be great to know what the correct (Instruct) formatting is.

The format I use is the one in the model config.json, which is llama3 instruct. This model picked up a bunch of other formats from the merge, so a few work. Llama3 is the one that I test on.

I run it with SillyTavern and Koboldcpp with the following GGUF: https://huggingface.co/mradermacher/Mirai-70B-1.0-i1-GGUF/blob/main/Mirai-70B-1.0.i1-Q4_K_M.gguf
Would you mind sharing your Sillytavern master file (Master export)? That way I know I at least use your exact settings.

Configuration 1 (Freehand/Creative -- More temp -> Dumber but more interesting)
Temperature 1.3
Min-p 0.05

Configuration 2 (Psuedo-Greedy with uncertainty entropy)
Temperature 3 (With temperature last)
Min-p 0.5

I do not use any other samplers, or penalties, literally just these two things.

Thanks, I meant however the Context, Instruct and System setting (the one under the A). Mind sharing those?

Everything is default, I'm using the prompt template in the model settings with no alterations. Use the settings in the config.I use this at 8k ctx as rope is typically bad, but you're welcome to try roping to longer ctx.

This is just the default export, but here it is, there's absolutely zero changes, and this is not what's being used.

{
    "instruct": {
        "input_sequence": "### Instruction:",
        "output_sequence": "### Response:",
        "last_output_sequence": "",
        "system_sequence": "### Input:",
        "stop_sequence": "",
        "wrap": true,
        "macro": true,
        "names_behavior": "force",
        "activation_regex": "",
        "system_sequence_prefix": "",
        "system_sequence_suffix": "",
        "first_output_sequence": "",
        "skip_examples": false,
        "output_suffix": "\n\n",
        "input_suffix": "\n\n",
        "system_suffix": "\n\n",
        "user_alignment_message": "",
        "system_same_as_user": false,
        "last_system_sequence": "",
        "first_input_sequence": "",
        "last_input_sequence": "",
        "names_force_groups": true,
        "name": "Alpaca"
    },
    "context": {
        "story_string": "{{#if system}}{{system}}\n{{/if}}{{#if wiBefore}}{{wiBefore}}\n{{/if}}{{#if description}}{{description}}\n{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}\n{{/if}}{{#if scenario}}Scenario: {{scenario}}\n{{/if}}{{#if wiAfter}}{{wiAfter}}\n{{/if}}{{#if persona}}{{persona}}\n{{/if}}",
        "example_separator": "***",
        "chat_start": "***",
        "use_stop_strings": true,
        "allow_jailbreak": false,
        "names_as_stop_strings": true,
        "always_force_name2": true,
        "trim_sentences": false,
        "single_line": false,
        "name": "Default"
    },
    "sysprompt": {
        "name": "Neutral - Chat",
        "content": "Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}."
    }
}

I also experience endless rambling when using llama formatting. Switching to ChatML alters the output somewhat, but stops the rambling.

Well doing some testing it seems the issue is in the instruct format.
using the Llama3 context works fine as long as you also use the alpaca instruct format. Changing the instruct format to Llama3 causes the issue.
using the Llama3 context format also has the benefit that the model seems a lot smarter.
Something seems off though with the instruct format. Maybe a token thing? This would need further investigating.

So, it is definitely something to do with the message sequences. Either the user or the assistant, this I am not sure off.
So something happened to make it not happy with the standard Llama3 message sequences.

Thanks for testing, I set up an isolation test and yeah, it looks like stop tokens aren't being generated correctly for L3 at all. I'll take a deeper look at what's going on.

Good news, I was able to put together a minimal merge that actually uses L3 prompt without this, so one of the merges (or several) are responsible. I'll have to go back and iterate through each one to find the culprits. Appreciate the investigation, will hopefully have a more working model in a few days.

Amazing. Thanks for this!

Solid week of trying, the result so far is experimental and extremely different than this model. But it's good on it's own legs, and actually works with L3-Instruct Mirai 3.0

I'll leave this open because it's still relevant to this particular model. Cheers.

Great! Thanks a lot. Can't wait to try out the new model. As soon as there is a good GGUF out for it.

Sign up or log in to comment