Llamacpp Quantizations of THUDM/glm-4-9b-chat

Using llama.cpp commit hash 7d0e23d72ef4540d0d4409cb63ae682c17d53926 for quantization. Notably this includes b3333, the first official llama.cpp release that supports GLM-3 and GLM-4.

Original model: https://huggingface.co/THUDM/glm-4-9b-chat

I have tested the gguf files with some simple prompts and they seem to work fine.

Prompt format

[gMASK]<sop><|user|>
{prompt}
<|assistant|>

Apparently the model supports function calling as well if you supply a more elaborate system prompt. The original chat template is provided in https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/tokenizer_config.json , and it is too complicated if you don't want that functionality. (If you don't read Chinese, you're advised to translate it to a language you understand and read it first before adopting that prompt for your purposes.)

Quantizations

Due to resource limitations we only have a select handful of quantizations. Hopefully they are useful for your purposes.

  • MD5 (glm4-9b-chat-IQ3_S.gguf) = d6f4f51c5c4e7d3e8c1d93044fd92b9d
  • MD5 (glm4-9b-chat-Q4_K_M.gguf) = 9514ec1112b3e2a47cac52179d796c84
  • MD5 (glm4-9b-chat-Q4_K_S.gguf) = 38f48ddf4dc5f6845d070de5d1c3e4c6
  • MD5 (glm4-9b-chat-Q5_K_M.gguf) = 99717a90672ea7cf34f0ea23cff47c8a
  • MD5 (glm4-9b-chat-Q5_K_S.gguf) = b720b3cb4c5190bd36eac26f385e979b
  • MD5 (glm4-9b-chat-Q6_K.gguf) = b8a36cf46408ec558d471c38e55989c1
  • MD5 (glm4-9b-chat-Q8_0.gguf) = 1e2aea60e7c9453d560738f6bc06885e

Legal / License

"Built with glm-4"

I just copied the LICENSE file from https://huggingface.co/THUDM/glm-4-9b-chat as required for redistribution.

Downloads last month
96
GGUF
Model size
9.4B params
Architecture
chatglm

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .