--- license: unknown --- ## Llamacpp Quantizations of THUDM/glm-4-9b-chat Using llama.cpp commit hash 7d0e23d72ef4540d0d4409cb63ae682c17d53926 for quantization. Notably this includes b3333, the first official llama.cpp release that supports GLM-3 and GLM-4. Original model: https://huggingface.co/THUDM/glm-4-9b-chat I have tested the gguf files with some simple prompts and they seem to work fine. ## Prompt format ``` [gMASK]<|user|> {prompt} <|assistant|> ``` Apparently the model supports function calling as well if you supply a more elaborate system prompt. The original chat template is provided in https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/tokenizer_config.json , and it is too complicated if you don't want that functionality. (If you don't read Chinese, you're advised to translate it to a language you understand and read it first before adopting that prompt for your purposes.) ## Quantizations Due to resource limitations we only have a select handful of quantizations. Hopefully they are useful for your purposes. - MD5 (glm4-9b-chat-IQ3_S.gguf) = d6f4f51c5c4e7d3e8c1d93044fd92b9d - MD5 (glm4-9b-chat-Q4_K_M.gguf) = 9514ec1112b3e2a47cac52179d796c84 - MD5 (glm4-9b-chat-Q4_K_S.gguf) = 38f48ddf4dc5f6845d070de5d1c3e4c6 - MD5 (glm4-9b-chat-Q5_K_M.gguf) = 99717a90672ea7cf34f0ea23cff47c8a - MD5 (glm4-9b-chat-Q5_K_S.gguf) = b720b3cb4c5190bd36eac26f385e979b - MD5 (glm4-9b-chat-Q6_K.gguf) = b8a36cf46408ec558d471c38e55989c1 - MD5 (glm4-9b-chat-Q8_0.gguf) = 1e2aea60e7c9453d560738f6bc06885e ## Legal / License *"Built with glm-4"* I just copied the LICENSE file from https://huggingface.co/THUDM/glm-4-9b-chat as required for redistribution.