Llamacpp Quantizations of Llama-3.1-Herrsimian-8B

Using llama.cpp release b3703 for quantization.

Original model: https://huggingface.co/lemonilia/Llama-3.1-Herrsimian-8B

Quant Types:

Filename Quant type File Size Required VRAM at 32k ctx
Llama-3.1-Herrsimian-8B-F16.gguf F16 14.9GB 18.6GB
Llama-3.1-Herrsimian-8B-Q8_0.gguf Q8_0 7.95GB 14.0GB
Llama-3.1-Herrsimian-8B-Q6_K.gguf Q6_K 6.14GB 12.2GB
Llama-3.1-Herrsimian-8B-Q5_K_M.gguf Q5_K_M 5.33GB 11.4GB
Llama-3.1-Herrsimian-8B-Q5_K_S.gguf Q5_K_S 5.21GB 11.3GB
Llama-3.1-Herrsimian-8B-Q4_K_M.gguf Q4_K_M 4.58GB 10.6GB
Llama-3.1-Herrsimian-8B-Q4_K_S.gguf Q4_K_S 4.37GB 10.4GB
Llama-3.1-Herrsimian-8B-Q3_K_L.gguf Q3_K_L 4.02GB 10.1GB
Llama-3.1-Herrsimian-8B-Q3_K_M.gguf Q3_K_M 3.74GB 9.7GB
Llama-3.1-Herrsimian-8B-Q3_K_S.gguf Q3_K_S 3.41GB 9.4GB
Llama-3.1-Herrsimian-8B-Q2_K.gguf Q2_K 2.95GB 9.2GB
Downloads last month
178
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for knifeayumu/Llama-3.1-Herrsimian-8B-GGUF

Quantized
(3)
this model

Collection including knifeayumu/Llama-3.1-Herrsimian-8B-GGUF