Thanks for quantize this model! could you further quantize it into 3.0bpw?

#1
by blackcat1402 - opened

Hi, thanks for your quick response on this model. To fit it into 32G VRAM, it is kind of you to quantize a 3.0bpw model in exllamav2 format. Thanks in advance!

Hey @blackcat1402 , yes I can. I’ll start the job, it’ll take about an hour or two.

Would a 2.4 or 2.2 bpw fit on a 24gb card? Would love to try this.

No rush!

@DTechNation - I wouldn't recommend a quant this low, the quality will be severely degraded.

Understood. I have had mixed results with 2.3bpw lonestriker models back earlier this year. I need more VRAM for sure

@DTechNation if you’d like, I have a openwebui endpoint I run for some friends, runs the model at 7.0bpw with 90k context.

I could give you access for a week to experiment.

Chat.bigstorm.ai to signup.

Just let me know!

Sign up or log in to comment