|
--- |
|
license: gemma |
|
library_name: transformers |
|
tags: |
|
- unsloth |
|
- sft |
|
- pony |
|
- MyLittlePony |
|
- Russian |
|
- Lora |
|
base_model: AlexBefest/WoonaV1.2-9b |
|
language: |
|
- ru |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
## About |
|
|
|
GGUF imatrix quants of **[AlexBefest/WoonaV1.2-9b](https://huggingface.co/AlexBefest/WoonaV1.2-9b)** model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method. |
|
|
|
 |
|
|
|
|
|
## Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5) |
|
|
|
``` |
|
<start_of_turn>user\n {prompt}<end_of_turn> |
|
|
|
``` |
|
|
|
## Provided files |
|
|
|
| Name | Quant method | Bits | Size | Min RAM required | Use case | |
|
| ---- | ---- | ---- | ---- | ---- | ----- | |
|
| [WoonaV1.2-9b-imat-Q2_K.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q2_K.gguf) | Q2_K [imatrix] | 2 | 3.5 GB| 5.1 GB | small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse) | |
|
| [WoonaV1.2-9b-imat-IQ3_XXS.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ3_XXS.gguf) | IQ3_XXS [imatrix] | 3 | 3.5 GB| 5.1 GB | small, high quality loss | |
|
| [WoonaV1.2-9b-imat-IQ3_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ3_M.gguf) | IQ3_M [imatrix] | 3 | 4.2 GB| 5.7 GB | small, high quality loss | |
|
| [WoonaV1.2-9b-imat-IQ4_XS.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-IQ4_XS.gguf) | IQ4_XS [imatrix] | 4 | 4.8 GB| 6.3 GB | medium, slightly worse than Q4_K_M| |
|
| [WoonaV1.2-9b-imat-Q4_K_S.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q4_K_S.gguf) | Q4_K_S [imatrix] | 4 | 5.1 GB| 6.7 GB | medium, balanced quality loss | |
|
| [WoonaV1.2-9b-imat-Q4_K_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q4_K_M.gguf) | Q4_K_M [imatrix] | 4 | 5.4 GB| 6.9 GB | medium, balanced quality - recommended | |
|
| [WoonaV1.2-9b-imat-Q5_K_S.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q5_K_S.gguf) | Q5_K_S [imatrix] | 5 | 6 GB| 7.6 GB | large, low quality loss - recommended | |
|
| [WoonaV1.2-9b-imat-Q5_K_M.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-imat-Q5_K_M.gguf) | Q5_K_M [imatrix] | 5 | 6.2 GB| 7.8 GB | large, very low quality loss - recommended | |
|
| [WoonaV1.2-9b-Q6_K.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-Q6_K.gguf) | Q6_K [static] | 6 | 7.1 GB| 8.7 GB | very large, near perfect quality - recommended | |
|
| [WoonaV1.2-9b-Q8_0.gguf](https://huggingface.co/secretmoon/WoonaV1.2-9b-GGUF-Imatrix/blob/main/WoonaV1.2-9b-Q8_0.gguf) | Q8_0 [static] | 8 | 9.2 GB| 10.8 GB | very large, extremely low quality loss |
|
|
|
|
|
## How to Use |
|
|
|
- **[llama.cpp](https://github.com/ggerganov/llama.cpp)** |
|
The opensource framework for running GGUF LLM models on which all other interfaces are made. |
|
- **[koboldcpp](https://github.com/LostRuins/koboldcpp)** |
|
Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features. |
|
- **[LM studio](https://lmstudio.ai/)** |
|
Proprietary free fork llama.cpp with a graphical interface. |