Nautilus is not L3.3
Anubis 70B works better
Anything llama 3.1 / 3.3 derived worked fairly well and I had some interesting results after testing the Nautilus merges. I do a grid search on merge combinations - automatic discard on anything blatantly bad and then manually rank side by side on the inspection of outputs (Anubis was part of the merges explored). For the record - Nautilus was very impressive. In my experience L3.3 feels more like a sidegrade than an upgrade over Nemotron - L3.3 has more prompt control but is dumber.
Also worth mentioning I tend to bias heavily towards smarts in my testing (which is 50% traditional creative writing, 50% RP style). Whilst bias and style are factors I'm primarily looking for the model to write a narrator / characters that make "clever" scenario choices - hallucinations are heavily penalized.
Honestly this was an experiment to see what the r1 distill influence was like on good community finetunes.
Oh I see. Thought it was a mistake