this one's probably next for gpt-oss-20b, hopefully that will be an interesting comparison vs Qwen3-14B :)
next Experimental Modality is in the dataset generation stage, excited about bringing that to everyone!
this one's probably next for gpt-oss-20b, hopefully that will be an interesting comparison vs Qwen3-14B :)
next Experimental Modality is in the dataset generation stage, excited about bringing that to everyone!
thank you so much <3
yeah the particular combo that is oss-20b (larger experts + smaller amount of experts + already trained at MXFP4 so no easy gains from just-make-it-smaller-with-quantization-instead) seems well suited for this vs the qwen 3 30b-a3b type of MoE. definitely encourage this type of experimentation in general :)
we'll be expanding Qwen sizes in both directions :) thanks for your review!