MarinaraSpaghetti/NemoRemix-12B

MarinaraSpaghetti

Owner Aug 7, 2024

Feedback appreciated, thank you!

MarinaraSpaghetti pinned discussion Aug 7, 2024

traveltube

Aug 7, 2024

•

edited Aug 7, 2024

Just want to ask if chatml or mistral format is preferred? Thanks!
Edit: Never mind, I see it's on the updated model card!

MarinaraSpaghetti

Owner Aug 7, 2024

@traveltube ChatML is recommended! Have fun!

traveltube

Aug 7, 2024

•

edited Aug 7, 2024

I haven't tested it too much yet, but from my experience with it using chatml so far it seems that while it seemed pretty intelligent at first, my initial impressions are that unfortunately it tends to repeat the initial prompt / background a bit without significant extrapolation or interpretation, has a few GPTisms, and couldn't effectively handle a somewhat complex scenario at around 3-4k context x3 tries that magnus v2 managed on first try with the same settings. Could the model do better with mistral format possibly? But anyway, I'd like to hear others' experiences with this as well!

MarinaraSpaghetti

Owner Aug 7, 2024

•

edited Aug 7, 2024

@traveltube That is very strange behavior, I am currently running this model on 64k context, and it’s handling the sudden switch from the first-person POV to the third person one (switch between a specific character and Narrator card) excellently without any hiccups.

Even on fresh test chats which are around 2-3k at start, given my long World Info, Character Cards, etc., the model is working fine. Unless I did an oopsie and uploaded one of my older merges instead of the newest one, but the info here on HF seems correct, hm.
What are your settings, parameters, etc.? Are you running the recommended ones or something different?
You can also try this alternative version, because it has one less model added to the mix, and it also worked fine from my tests and let me know if it works better:
https://huggingface.co/MarinaraSpaghetti/NemoRemix-TESTING/blob/main/NemoRemix-test-12B-q8_0.gguf

traveltube

Aug 7, 2024

•

edited Aug 7, 2024

https://cdn-uploads.huggingface.co/production/uploads/636b49b5a45a3177194b8a38/0GF1qcvXElF_QoGLYMOUH.png
I use these settings for nearly everything now as it's worked well for most models. I'll try that one later, thanks!

Edit: there's 3 models in there, which one would you recommend trying?

MarinaraSpaghetti

Owner Aug 7, 2024

•

edited Aug 7, 2024

@traveltube Try the NemoRemix-test-12B-q8_0.gguf model, it's the one.
Now, I hope to not be too brutal, but my dude, these settings you just sent me are absolutely atrocious, lmao.

Firstly, you DO NOT want to mix Dynamic Temperature with Smoothing Factor, the author was very blunt about not pairing them together. Secondly, DO NOT pair Repetition Penalty with DRY, this also breaks the models! Use only one of these two, preferably just DRY. Also, do not use Presence Penalty at all — it is actively working against your Repetition Penalty by encouraging words that have already been used in the context to re-appear. Thirdly, you are using WAY too many tail-cutting samplers. Top K, together with Top P and Min P? You are basically killing all the creativity, making it ultra deterministic.

Generally speaking, the fewer samplers you use the better. Right now all I am using is Temperature, DRY, and Min P/Top A. That's it. Just to show you how much your settings are harming your model, here's a neat website that illustrates how they work on the token probability: https://artefact2.github.io/llm-sampling/index.xhtml. I highly recommend using this website — it helps so much with getting better understatement of how these work!

You can use my settings, including instruct format and parameters, they're all here: https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings/tree/main.

SillyTavern itself also has a cool explanation on how different samplers work and are recommended to be used: https://docs.sillytavern.app/usage/common-settings/.

Hope this helps!

traveltube

Aug 8, 2024

•

edited Aug 8, 2024

Well, I got rid of the dynatemp and basically all of the other penalties besides DRY as you suggested. Honestly it's kinda hard to know how these things are supposed to work together as others have said different things and on reddit someone reported that the penalties worked synergistically for them despite not being intended to do so, but in the end it seems like as you said, getting rid of all of that definitely made the model respond a whole lot better! I'll see how it is in a few days but so far it seems significantly smarter than before and I'm liking it a lot better now :)

Edit: added back in some rep penalty, seems to actually help significantly with any repetition in formatting!

MarinaraSpaghetti

Owner Aug 8, 2024

@traveltube Hey, I don't blame you, the samplers are super complicated, and I cannot count how many hours I have sunk to grasp them. :) You can use Repetition Penalty together with DRY, but then you have to have them both super low, otherwise they will brick the model. But I recommend to not use them together overall; DRY is better in every aspect anyway. And glad to read the model is working better now! Hope it will serve you well!

Firepin

Aug 8, 2024

Hi MarinaraSpaghetti,
you wrote we can get the settings from here:
https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings/tree/main
Could you somehow on your modelcard or on the above repository in a readme.txt or something clarify:

Difference between the Basic and customized folders.
Please name the jsons correctly not just Mistral-Custom.json and Mistral-Custom(1).json.
What i mean not adding only an (1).
I suppose if i am not mistaken that these two files are for the Instruct AND Context Setting but if they are named like this you don't know which is which.
Other Modelcreators call them Mistral-Custom-Instruct.json and for example Mistral-Custom-Context.json.
Perhaps those two files are different versions of one of the above? (context or instruct)?
Please add the third Preset "Text Completion" preset (where the samplers are).
Other Modelcreators added those 3 settings which helped much because as you see above many have suboptimal or botched settings.
They get frustrated with the responses from the models you have worked so hard for and give wrong recommendations to others because of their bad experience (cause by false settings).
You can't get good feedback for your model if you don't make it 100% clear and idiotproof which files should be used for ST settings. (^_^).
Best to write it or add it to the model repo and model card probably even, if you start to have many different settings instead of a central repo you are using now.
THANKS FOR YOUR GREAT MODEL AND YOUR EFFORTS!!!

MarinaraSpaghetti

Owner Aug 8, 2024

•

edited Aug 8, 2024

Hey @Firepin ! Thanks for the feedback.

These are just the settings/parameters I am using and recommending; the model will work too if someone uses different prompts or samplers, etc.; as long as they follow the ChatML format mentioned in the card. I haven't fine-tuned any models yet to have them work in a very specific setting.

The difference is, well, that one version is customized and the other not, lol. It's just the Customized folder stores the prompts I am personally using for my use case, while Basic holds the ones without my custom prompt, so others can adjust it to their need. I will add this information to the card there.
Ah, I had mine named the same, since they have the 'Bind to Context' flag checked. I assumed one could differentiate them easily by simply checking their contents or by… trying to import them. Instruct will not import into Story String and vice versa. But I'll change the names too, to make it more obvious.
Not sure what you mean by that. I am running my ST in 'Text Completion' mode all the time, and exported whatever I could. If you mean just the parameters, then I have them in the third folder called Parameters, plus the recommended ones are also mentioned in the model's card, right below the recommended Instruct format.

I don't want to be too overbearing with my inputs, since my use case may differ from someone else's. For example, I love when my characters write while utilizing humor and gripping prose, so I run higher Temperatures. Meanwhile, someone else might prefer simple RP with short dialogue-action format and may need more deterministic setup. In that selected scenario, they will find my model unsatisfactory while running my exact recommendations. But that's exactly why I have created this thread — so I can advise and help others get the most out of this model, plus learn what to improve on in the future, when doing other merges. Hope this clears up the things a bit, and thank you for your kind words! Cheers! :)