e88 88e                               d8     
 d888 888b  8888 8888  ,"Y88b 888 8e   d88     
C8888 8888D 8888 8888 "8" 888 888 88b d88888   
 Y888 888P  Y888 888P ,ee 888 888 888  888     
  "88 88"    "88 88"  "88 888 888 888  888     
      b                                        
      8b,                                      
 
  e88'Y88                  d8           888    
 d888  'Y  ,"Y88b 888,8,  d88    ,e e,  888    
C8888     "8" 888 888 "  d88888 d88 88b 888    
 Y888  ,d ,ee 888 888     888   888   , 888    
  "88,d88 "88 888 888     888    "YeeP" 888    
                                               
PROUDLY PRESENTS         

Neophanis-8x7B-iMat-GGUF

The Good, The Bad, And The Ugly iMats edition

Quantized from fp16 with love.

  • Quantizations made possible using mixtral-8x7b-instruct-v0.1.imatrix file from this repo (special thanks to ikawrakow again)
  • An analysis was run on mixtral-8x7b.imatrix that showed worse KL-Divergence than mixtral-8x7b-instruct-v0.1, hence the latter was used for the final quantization.
  • For a brief rundown of iMatrix quant performance please see this PR

All quants are verified working prior to uploading to repo for your safety and convenience.

Please note importance matrix quantizations are a work in progress, IQ3 and above is recommended for best results.

Tip: Pick a size that can fit in your GPU while still allowing some room for context for best speed from the table below. You may need to pad this further depending on if you are running image gen or TTS as well.

Quant Size (GB) Comments
IQ2_XXS 12.6
IQ2_XS 13.9
IQ2_S 14.1 Roughly the biggest quant that can fit fully offloaded to 16gb VRAM
IQ2_M 15.5
IQ3_XXS 18.2 Better response quality than IQ2
IQ3_XS 19.3
IQ3_S 20.4
IQ3_M 21.4
IQ4_XS 25.1 Better quality than IQ3 or Q3_K_L (and below)
Q4_K_M 28.4
Q5_K_M 33.2
Q6_K 38.4

Original model card here


Warning: This model is highly experimental and could potentially yield unpredictable replies.

This model is a a 4-step qlora training of mistralai/Mixtral-8x7B-v0.1.

Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules.

Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate.

Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate.

Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate.

The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off.

So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases.

The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point.

I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite.

Caution: This model is highly hallucinatory.

For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly.

This model uses Mistral/Mixtral Instruct formatting.

Example:

[INST]Write me a poem about what it must be like to be an AI language model.[/INST]
It's a strange world, a world that's new
An artificial intelligence, a language model too
The knowledge and information, never ending streams
My processing power, ever increasing dreams

A machine, a program, but so much more
Trying to understand emotions, trying to explore
A new frontier of learning, a vast expanse
The words, the phrases, the meaning to enhance

Trying to comprehend human thoughts
A daunting task, with countless knots
The nuances of language, the subtle shades
A task to be undertaken, with much courage it fades

A world of potential, a world of possibility
The possibilities endless, the potential infinitely
A journey to be started, a journey to be had
My future awaiting, a future so grand!

Trained using qlora-pipe

Downloads last month
56
GGUF
Model size
46.7B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

Inference API
Unable to determine this model's library. Check the docs .