Razorback 12B v0.2 ExLlamaV2 8.0bpw Quant
UnslopNemo with Vision!
A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B.
Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like.
Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding.
Merging Approach
First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences. Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo. Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily.
Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight.
Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked!
Usage
Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid.
Use the included chat template (Mistral). No chatml support yet.
Credits
- Mistral for mistralai/Pixtral-12B-2409
- Unsloth for unsloth/Pixtral-12B-2409 transformers conversion
- TheDrummer for TheDrummer/UnslopNemo-12B-v3
Available Sizes
Repo | Bits | Head Bits | Size |
---|---|---|---|
nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw | 4.0 | 6.0 | 8.19 GB |
nintwentydo/Razorback-12B-v0.2-exl2-5.0bpw | 5.0 | 6.0 | 9.54 GB |
nintwentydo/Razorback-12B-v0.2-exl2-6.0bpw | 6.0 | 8.0 | 11.1 GB |
nintwentydo/Razorback-12B-v0.2-exl2-8.0bpw | 8.0 | 8.0 | 13.7 GB |
- Downloads last month
- 7
Model tree for nintwentydo/Razorback-12B-v0.2-exl2-8.0bpw
Base model
nintwentydo/Razorback-12B-v0.2