thirteenbit
commited on
Commit
•
7abb801
1
Parent(s):
9876284
Update README.md
Browse files
README.md
CHANGED
@@ -17,3 +17,17 @@ use with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible soft
|
|
17 |
|
18 |
Converted to gguf using llama.cpp [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
|
19 |
and quantized using llama.cpp llama-quantize, llama.cpp version [b3325](https://github.com/ggerganov/llama.cpp/commits/b3325).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
Converted to gguf using llama.cpp [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
|
19 |
and quantized using llama.cpp llama-quantize, llama.cpp version [b3325](https://github.com/ggerganov/llama.cpp/commits/b3325).
|
20 |
+
|
21 |
+
|
22 |
+
## Provided files
|
23 |
+
|
24 |
+
| Name | Quant method | Bits | Size | VRAM required |
|
25 |
+
| ---- | ---- | ---- | ---- | ---- |
|
26 |
+
| [model-q3_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q3_k_m.gguf) | Q3_K_M | 3 | 4.9 GB| 5.7 GB |
|
27 |
+
| [model-q4_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q4_k_m.gguf) | Q4_K_M | 4 | 6.3 GB| 7.1 GB |
|
28 |
+
| [model-q5_k_m.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q5_k_m.gguf) | Q5_K_M | 5 | 7.2 GB| 7.9 GB |
|
29 |
+
| [model-q6_k.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q6_k.gguf) | Q6_K | 6 | 8.2 GB| 8.9 GB |
|
30 |
+
| [model-q8_0.gguf](https://huggingface.co/thirteenbit/madlad400-10b-mt-gguf/blob/main/model-q8_0.gguf) | Q8_0 | 8 | 11 GB| 11.3 GB |
|
31 |
+
|
32 |
+
**Note**: the above VRAM usage figures are observed with all layers GPU offloading, on Linux with NVIDIA GPU.
|
33 |
+
|