This is using bitsandbytes 8 bit quantization that is broken on vLLM at the moment.
Any chance to release it using GTPQ or any fixes to vLLM incoming?
· Sign up or log in to comment