[Urgent Feedback]
.....
This version is more stable with different sampler values.
I gave it a first try and it's indeed much better. Although the suggested sampler settings for creative still make the model run away, but maybe that's also because my system prompt asks for elaborate replies.
I found these sampler settings work quite well:
I think increasing min_p to 0.1 could potentially get rid of all the broken replies.
Output generated in 13.82 seconds (22.44 tokens/s, 310 tokens, context 999, seed 1696117932)
Output generated in 9.80 seconds (39.57 tokens/s, 388 tokens, context 1924, seed 1209792919)
Output generated in 11.78 seconds (43.47 tokens/s, 512 tokens, context 1931, seed 1966727220)
Output generated in 7.24 seconds (42.94 tokens/s, 311 tokens, context 1930, seed 1829985442)
Output generated in 6.44 seconds (39.63 tokens/s, 255 tokens, context 1509, seed 2078299907)
Output generated in 9.10 seconds (43.18 tokens/s, 393 tokens, context 1801, seed 16137133)
Output generated in 7.74 seconds (42.91 tokens/s, 332 tokens, context 1801, seed 1709554556)
Output generated in 34.96 seconds (43.93 tokens/s, 1536 tokens, context 2159, seed 1807509361)
Output generated in 11.59 seconds (44.17 tokens/s, 512 tokens, context 2159, seed 175036327)
Output generated in 15.58 seconds (42.69 tokens/s, 665 tokens, context 2453, seed 2134986945)
Output generated in 10.68 seconds (38.68 tokens/s, 413 tokens, context 3031, seed 1299966398)
Output generated in 13.21 seconds (40.36 tokens/s, 533 tokens, context 3102, seed 1655647270)
Output generated in 36.63 seconds (41.93 tokens/s, 1536 tokens, context 3622, seed 566803189)
Output generated in 11.38 seconds (37.78 tokens/s, 430 tokens, context 4120, seed 1181704660)
Output generated in 38.27 seconds (40.13 tokens/s, 1536 tokens, context 4573, seed 195224025)
Output generated in 17.49 seconds (37.90 tokens/s, 663 tokens, context 5108, seed 592816127)
Output generated in 16.49 seconds (35.41 tokens/s, 584 tokens, context 6219, seed 1504294262)
Output generated in 16.16 seconds (34.59 tokens/s, 559 tokens, context 6471, seed 433793304)
Output generated in 15.17 seconds (32.89 tokens/s, 499 tokens, context 7084, seed 841581385)
Output generated in 13.34 seconds (30.73 tokens/s, 410 tokens, context 7581, seed 643386474)
Output generated in 41.94 seconds (36.62 tokens/s, 1536 tokens, context 8013, seed 1794401070)
Output generated in 20.31 seconds (32.14 tokens/s, 653 tokens, context 8868, seed 1987104393)
Output generated in 26.29 seconds (33.40 tokens/s, 878 tokens, context 9325, seed 1908203673)
Output generated in 26.39 seconds (33.27 tokens/s, 878 tokens, context 9531, seed 1665999223)
Output generated in 17.26 seconds (37.96 tokens/s, 655 tokens, context 9531, seed 1840168358)
Output generated in 18.47 seconds (30.38 tokens/s, 561 tokens, context 10047, seed 1204298660)
Output generated in 21.23 seconds (30.43 tokens/s, 646 tokens, context 10668, seed 1806801985)
Output generated in 25.34 seconds (30.78 tokens/s, 780 tokens, context 11349, seed 62196682)
All the generations that ended up with 1536 tokens were botched, at least the tail end / second half. All the others were really good. In fact much better than I anticipated from the last model. I also haven seen any degradation over 8k context, which is huge for an L3 model.
Really nice model so far, well done. The quality increase between this and the last is amazing. I'll continue to experiment with it and get back when I have more insights. :)
Glad it's now working for you.
I use J.AI through Featherless and the bot abruptly refuses to talk after a few great replies and interactions, either just saying "Ah" , ", or just a straight blank. I played with temp to no avail (0.7-1.2)
hmm. what prompts are you using and context template?