Update README.md
Browse files
README.md
CHANGED
@@ -47,6 +47,10 @@ ASSISTANT:
|
|
47 |
<!-- compatibility_ggml start -->
|
48 |
## Compatibility
|
49 |
|
|
|
|
|
|
|
|
|
50 |
### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
|
51 |
|
52 |
These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
|
|
|
47 |
<!-- compatibility_ggml start -->
|
48 |
## Compatibility
|
49 |
|
50 |
+
**Note:** due to this model having a non-standard vocab size of 32,001, k-quants are slightly larger than they are for other models of the same size and type.
|
51 |
+
|
52 |
+
For example, a 13B q4_K_M will be around 150MB larger. Inference speed should not be noticeably affected, and quality will be the same or higher.
|
53 |
+
|
54 |
### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
|
55 |
|
56 |
These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
|