add example how to download splits for 1 quant (#6)

786571a verified 9 months ago

1.81 kB

	---
	tags:
	- quantized
	- 2-bit
	- 3-bit
	- 4-bit
	- 5-bit
	- 6-bit
	- 8-bit
	- GGUF
	- transformers
	- safetensors
	- mistral
	- text-generation
	- arxiv:2304.12244
	- arxiv:2306.08568
	- arxiv:2308.09583
	- license:apache-2.0
	- autotrain_compatible
	- endpoints_compatible
	- text-generation-inference
	- region:us
	- text-generation
	model_name: WizardLM-2-8x22B-GGUF
	base_model: microsoft/WizardLM-2-8x22B
	inference: false
	model_creator: microsoft
	pipeline_tag: text-generation
	quantized_by: MaziyarPanahi
	---
	# [MaziyarPanahi/WizardLM-2-8x22B-GGUF](https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF)
	- Model creator: [microsoft](https://huggingface.co/microsoft)
	- Original model: [microsoft/WizardLM-2-8x22B](https://huggingface.co/microsoft/WizardLM-2-8x22B)

	## Description
	[MaziyarPanahi/WizardLM-2-8x22B-GGUF](https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF) contains GGUF format model files for [microsoft/WizardLM-2-8x22B](https://huggingface.co/microsoft/WizardLM-2-8x22B).

	## How to download
	You can download only the quants you need instead of cloning the entire repository as follows:


	```
	huggingface-cli download MaziyarPanahi/WizardLM-2-8x22B-GGUF --local-dir . --include 'Q2_Kgguf'
	```

	## Load sharded model

	`llama_load_model_from_file` will detect the number of files and will load additional tensors from the rest of files.

	```sh
	llama.cpp/main -m WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
	```


	## Prompt template

	```
	{system_prompt}
	USER: {prompt}
	ASSISTANT: </s>
	```

	or

	```
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful,
	detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>
	USER: {prompt} ASSISTANT: </s>......
	```