Any chance for a 4.8 bpw?

#1
by Adzeiros - opened

Hey, curious if you could make a 4.8bpw version? I want something with the same BPW as the Q4_K_M GGUF version (I think it's roughly 4.8bpw) As I can run the Q4_K_M GGUF on my 48GB of vram with 32k context if I use the 4bit caching, and I am pretty sure EXL2 has better context caching than GGUF and is faster in terms of TPS

MikeRoz changed discussion status to closed

Sign up or log in to comment