|
--- |
|
license: unknown |
|
--- |
|
## Llamacpp Quantizations of THUDM/glm-4-9b-chat |
|
|
|
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit hash 7d0e23d72ef4540d0d4409cb63ae682c17d53926 for quantization. Notably this includes b3333, the first official llama.cpp release that supports GLM-3 and GLM-4. |
|
|
|
Original model: https://huggingface.co/THUDM/glm-4-9b-chat |
|
|
|
|
|
I have tested the gguf files with some simple prompts and they seem to work fine. |
|
|
|
## Prompt format |
|
|
|
``` |
|
[gMASK]<sop><|user|> |
|
{prompt} |
|
<|assistant|> |
|
``` |
|
|
|
Apparently the model supports function calling as well if you supply a more elaborate system prompt. The original chat template is provided in https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/tokenizer_config.json , and it is too complicated if you don't want that functionality. (If you don't read Chinese, you're advised to translate it to a language you understand and read it first before adopting that prompt for your purposes.) |
|
|
|
## Quantizations |
|
|
|
Due to resource limitations we only have a select handful of quantizations. Hopefully they are useful for your purposes. |
|
|
|
- MD5 (glm4-9b-chat-IQ3_S.gguf) = d6f4f51c5c4e7d3e8c1d93044fd92b9d |
|
- MD5 (glm4-9b-chat-Q4_K_M.gguf) = 9514ec1112b3e2a47cac52179d796c84 |
|
- MD5 (glm4-9b-chat-Q4_K_S.gguf) = 38f48ddf4dc5f6845d070de5d1c3e4c6 |
|
- MD5 (glm4-9b-chat-Q5_K_M.gguf) = 99717a90672ea7cf34f0ea23cff47c8a |
|
- MD5 (glm4-9b-chat-Q5_K_S.gguf) = b720b3cb4c5190bd36eac26f385e979b |
|
- MD5 (glm4-9b-chat-Q6_K.gguf) = b8a36cf46408ec558d471c38e55989c1 |
|
- MD5 (glm4-9b-chat-Q8_0.gguf) = 1e2aea60e7c9453d560738f6bc06885e |
|
|
|
## Legal / License |
|
|
|
*"Built with glm-4"* |
|
|
|
I just copied the LICENSE file from https://huggingface.co/THUDM/glm-4-9b-chat as required for redistribution. |
|
|
|
|