Thanks, this takes more VRAM, is a 3.5bpw possible?

#1
by async0x42 - opened

Unfortunately even w/ a 4090/4080 (40gb vram) I can only fit in 13056 ctx at 4bpw w/ Q4 cache compared to 32K context w/ the 70b models using exllama 0.1.5. I'm downloading it now to try to do it but I've still been getting the same no-log exists as the other attempts for other models.

Edit: Actually, if you're able to include the measurements json in the repo then that would work too!

Sign up or log in to comment