Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sometimesanotion 
posted an update 3 days ago
Post
3033
**Update** Either I had some wrong numbers plugged in to estimate benchmark numbers from comparator, or the benchmark changed. Virtuoso Small v2 at 41.07 average is still very impressive, especially for writing draft copy for business purposes, while Lamarck remains a chatty generalist-reasoning model.

I've felt confident that 14B Qwen finetunes and merges could break the 42.0 average, and Arcee **came close** with https://huggingface.co/arcee-ai/Virtuoso-Small-2. Congratulations to @arcee-ai !

Just two months ago, it was easy to think that 14B had plateaued, that you could have high IFEVAL or high MUSR/MATH/GPQA at 14B, but not both. That barrier is completely shattered. I see a pathway to even better, and Virtuoso Small 2 is a big part of why. Very impressive work. This community would expect no less from Arcee.

Just look at this graph! Keep in mind, my merges here build on the first Virtuoso Small, and *-DS merges build on DeepSeek R1. There are some impressive merges in the pipe!

Congratulations as well! When I first saw the evaluation results for Virtuoso-Small-2, I quickly abandoned the release of "miscii-14b-0130". Although BBH and IFEval were once strengths of the miscii series - I admit that within my limited personal technical capabilities, I was indeed beaten by @arcee-ai ;)

·

Any model of yours made to a purpose beyond benchmarks has a reason unto itself. Your tempesthenno-ppo-ckpt40 does neat things. I've also found surprise pops in benchmarks for merges when two models that have similar scores arrive at it in different and complementary ways.

Not gonna lie, my merge strategy for Lamarck v0.8 was made with the expectation of 3-4 models with different strengths, and the combination of IFEVAL, BBH, MATH, and CoT in Virtuoso-Small-v2 is forcing me to look hard at that.

Thank you for sharing this! We haven’t publicized v2 yet as we’re building out our model engine to support all these different models - but there’s more come here. Along with some updates to mergekit!

·

My high-benchmarking merges have included Virtuoso v1 at nearly every stage, and I am now creating a new generation switching in V2 where apt.

Feedback from finetuners suggests my minimal compute and Arcee's MergeKit have given them a shortcut to great results. Smart merging really is energy efficient. Thank you for helping us push the limits!