NOTE: You will need a recent build of llama.cpp to run these quants (i.e. at least commit 494c870).

2024-03-19: Uploading new quants retrained on wiki.train.raw for ~100K tokens.
2024-03-07: Refreshing quants using latest build as things seem to have stabilized a bit now.

GGUF importance matrix (imatrix) quants for https://huggingface.co/Qwen/Qwen1.5-72B-Chat

Layers Context Template
80
32768
<|im_start|>system
{instructions}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
{response}
Downloads last month
75
GGUF
Model size
72.3B params
Architecture
qwen2

2-bit

3-bit

4-bit

Inference Examples
Inference API (serverless) does not yet support gguf models for this pipeline type.

Model tree for dranger003/Qwen1.5-72B-Chat-iMat.GGUF

Base model

Qwen/Qwen1.5-72B
Quantized
(3)
this model