Thanks, this takes more VRAM, is a 3.5bpw possible?

by async0x42 - opened Jun 17, 2024

Jun 17, 2024

•

edited Jun 19, 2024

Unfortunately even w/ a 4090/4080 (40gb vram) I can only fit in 13056 ctx at 4bpw w/ Q4 cache compared to 32K context w/ the 70b models using exllama 0.1.5. I'm downloading it now to try to do it but I've still been getting the same no-log exists as the other attempts for other models.

Edit: Actually, if you're able to include the measurements json in the repo then that would work too!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment