Any chance for a 4.8 bpw?
#1
by
Adzeiros
- opened
Hey, curious if you could make a 4.8bpw version? I want something with the same BPW as the Q4_K_M GGUF version (I think it's roughly 4.8bpw) As I can run the Q4_K_M GGUF on my 48GB of vram with 32k context if I use the 4bit caching, and I am pretty sure EXL2 has better context caching than GGUF and is faster in terms of TPS
MikeRoz
changed discussion status to
closed
Tysm :)