File size: 1,734 Bytes

---
license: unknown
---
## Llamacpp Quantizations of THUDM/glm-4-9b-chat

Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit hash 7d0e23d72ef4540d0d4409cb63ae682c17d53926 for quantization. Notably this includes b3333, the first official llama.cpp release that supports GLM-3 and GLM-4.

Original model: https://huggingface.co/THUDM/glm-4-9b-chat


I have tested the gguf files with some simple prompts and they seem to work fine.

## Prompt format

```
[gMASK]<sop><|user|>
{prompt}
<|assistant|>
```

Apparently the model supports function calling as well if you supply a more elaborate system prompt. The original chat template is provided in https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/tokenizer_config.json , and it is too complicated if you don't want that functionality. (If you don't read Chinese, you're advised to translate it to a language you understand and read it first before adopting that prompt for your purposes.)

## Quantizations

Due to resource limitations we only have a select handful of quantizations. Hopefully they are useful for your purposes.

- MD5 (glm4-9b-chat-IQ3_S.gguf) = d6f4f51c5c4e7d3e8c1d93044fd92b9d
- MD5 (glm4-9b-chat-Q4_K_M.gguf) = 9514ec1112b3e2a47cac52179d796c84
- MD5 (glm4-9b-chat-Q4_K_S.gguf) = 38f48ddf4dc5f6845d070de5d1c3e4c6
- MD5 (glm4-9b-chat-Q5_K_M.gguf) = 99717a90672ea7cf34f0ea23cff47c8a
- MD5 (glm4-9b-chat-Q5_K_S.gguf) = b720b3cb4c5190bd36eac26f385e979b
- MD5 (glm4-9b-chat-Q6_K.gguf) = b8a36cf46408ec558d471c38e55989c1
- MD5 (glm4-9b-chat-Q8_0.gguf) = 1e2aea60e7c9453d560738f6bc06885e

## Legal / License

*"Built with glm-4"*

I just copied the LICENSE file from https://huggingface.co/THUDM/glm-4-9b-chat as required for redistribution.