sydneyfong
/

glm4-9b-chat-GGUF

Inference Endpoints

Model card Files Files and versions Community

glm4-9b-chat-GGUF / README.md

sydneyfong's picture

Update README.md

130c798 verified 6 months ago

|

history blame contribute delete

1.73 kB

	---
	license: unknown
	---
	## Llamacpp Quantizations of THUDM/glm-4-9b-chat

	Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit hash 7d0e23d72ef4540d0d4409cb63ae682c17d53926 for quantization. Notably this includes b3333, the first official llama.cpp release that supports GLM-3 and GLM-4.

	Original model: https://huggingface.co/THUDM/glm-4-9b-chat


	I have tested the gguf files with some simple prompts and they seem to work fine.

	## Prompt format

	```
	[gMASK]<sop><\|user\|>
	{prompt}
	<\|assistant\|>
	```

	Apparently the model supports function calling as well if you supply a more elaborate system prompt. The original chat template is provided in https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/tokenizer_config.json , and it is too complicated if you don't want that functionality. (If you don't read Chinese, you're advised to translate it to a language you understand and read it first before adopting that prompt for your purposes.)

	## Quantizations

	Due to resource limitations we only have a select handful of quantizations. Hopefully they are useful for your purposes.

	- MD5 (glm4-9b-chat-IQ3_S.gguf) = d6f4f51c5c4e7d3e8c1d93044fd92b9d
	- MD5 (glm4-9b-chat-Q4_K_M.gguf) = 9514ec1112b3e2a47cac52179d796c84
	- MD5 (glm4-9b-chat-Q4_K_S.gguf) = 38f48ddf4dc5f6845d070de5d1c3e4c6
	- MD5 (glm4-9b-chat-Q5_K_M.gguf) = 99717a90672ea7cf34f0ea23cff47c8a
	- MD5 (glm4-9b-chat-Q5_K_S.gguf) = b720b3cb4c5190bd36eac26f385e979b
	- MD5 (glm4-9b-chat-Q6_K.gguf) = b8a36cf46408ec558d471c38e55989c1
	- MD5 (glm4-9b-chat-Q8_0.gguf) = 1e2aea60e7c9453d560738f6bc06885e

	## Legal / License

	"Built with glm-4"

	I just copied the LICENSE file from https://huggingface.co/THUDM/glm-4-9b-chat as required for redistribution.