This is 2-bit quantization of Qwen/Qwen1.5-72B-Chat using QuIP#

Random samples from RedPajama and Skypile (for Chinese) are used as calibration data.

Model loading

Please follow the instruction of QuIP-for-all for usage.

As an alternative, you can use Aphrodite engine or my vLLM branch for faster inference. If you have problem installing fast-hadamard-transform from pip, you can also install it from source

Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.