secretmoon
/

WoonaV1.2-9b-GGUF-Imatrix

Text Generation

Inference Endpoints

Model card Files Files and versions Community

WoonaV1.2-9b-GGUF-Imatrix / README.md

secretmoon's picture

Update README.md

246148f verified 7 months ago

|

history blame contribute delete

3.4 kB

	---
	license: gemma
	library_name: transformers
	tags:
	- unsloth
	- sft
	- pony
	- MyLittlePony
	- Russian
	- Lora
	base_model: AlexBefest/WoonaV1.2-9b
	language:
	- ru
	pipeline_tag: text-generation
	---

	## About

	GGUF imatrix quants of [AlexBefest/WoonaV1.2-9b](https://huggingface.co/AlexBefest/WoonaV1.2-9b) model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6336c5b3e3ac69e6a90581da/1KKzl7nz9EyWI4CLvBvPp.png)


	## Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)

	```
	<start_of_turn>user\n {prompt}<end_of_turn>

	```

	## Provided files

	\| Name \| Quant method \| Bits \| Size \| Min RAM required \| Use case \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [WoonaV1.2-9b-imat-Q2_K.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q2_K.gguf) \| Q2_K [imatrix] \| 2 \| 3.5 GB\| 5.1 GB \| small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse) \|
	\| [WoonaV1.2-9b-imat-IQ3_XXS.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ3_XXS.gguf) \| IQ3_XXS [imatrix] \| 3 \| 3.5 GB\| 5.1 GB \| small, high quality loss \|
	\| [WoonaV1.2-9b-imat-IQ3_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ3_M.gguf) \| IQ3_M [imatrix] \| 3 \| 4.2 GB\| 5.7 GB \| small, high quality loss \|
	\| [WoonaV1.2-9b-imat-IQ4_XS.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ4_XS.gguf) \| IQ4_XS [imatrix] \| 4 \| 4.8 GB\| 6.3 GB \| medium, slightly worse than Q4_K_M\|
	\| [WoonaV1.2-9b-imat-Q4_K_S.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q4_K_S.gguf) \| Q4_K_S [imatrix] \| 4 \| 5.1 GB\| 6.7 GB \| medium, balanced quality loss \|
	\| [WoonaV1.2-9b-imat-Q4_K_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q4_K_M.gguf) \| Q4_K_M [imatrix] \| 4 \| 5.4 GB\| 6.9 GB \| medium, balanced quality - recommended \|
	\| [WoonaV1.2-9b-imat-Q5_K_S.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q5_K_S.gguf) \| Q5_K_S [imatrix] \| 5 \| 6 GB\| 7.6 GB \| large, low quality loss - recommended \|
	\| [WoonaV1.2-9b-imat-Q5_K_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q5_K_M.gguf) \| Q5_K_M [imatrix] \| 5 \| 6.2 GB\| 7.8 GB \| large, very low quality loss - recommended \|
	\| [WoonaV1.2-9b-Q6_K.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-Q6_K.gguf) \| Q6_K [static] \| 6 \| 7.1 GB\| 8.7 GB \| very large, near perfect quality - recommended \|
	\| [WoonaV1.2-9b-Q8_0.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-Q8_0.gguf) \| Q8_0 [static] \| 8 \| 9.2 GB\| 10.8 GB \| very large, extremely low quality loss


	## How to Use

	- [llama.cpp](https://github.com/ggerganov/llama.cpp)
	The opensource framework for running GGUF LLM models on which all other interfaces are made.
	- [koboldcpp](https://github.com/LostRuins/koboldcpp)
	Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
	- [LM studio](https://lmstudio.ai/)
	Proprietary free fork llama.cpp with a graphical interface.