Thanks for quantize this model! could you further quantize it into 3.0bpw?

by blackcat1402 - opened Oct 17, 2024

Oct 17, 2024

Hi, thanks for your quick response on this model. To fit it into 32G VRAM, it is kind of you to quantize a 3.0bpw model in exllamav2 format. Thanks in advance!

bigstorm

Owner Oct 17, 2024

Hey @blackcat1402 , yes I can. I’ll start the job, it’ll take about an hour or two.

DTechNation

Oct 17, 2024

•

edited Oct 17, 2024

Would a 2.4 or 2.2 bpw fit on a 24gb card? Would love to try this.

No rush!

bigstorm

Owner Oct 17, 2024

@DTechNation - I wouldn't recommend a quant this low, the quality will be severely degraded.

DTechNation

Oct 17, 2024

Understood. I have had mixed results with 2.3bpw lonestriker models back earlier this year. I need more VRAM for sure

bigstorm

Owner Oct 17, 2024

@DTechNation if you’d like, I have a openwebui endpoint I run for some friends, runs the model at 7.0bpw with 90k context.

I could give you access for a week to experiment.

Chat.bigstorm.ai to signup.

Just let me know!

bigstorm

Owner Oct 24, 2024

Closing for inactivity

bigstorm changed discussion status to closed Oct 24, 2024

bigstorm

Owner Oct 24, 2024

@blackcat1402 The 3.0 BPW quant is uploaded! Sorry, forgot to leave a comment earlier.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment