Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,10 @@ tags:
|
|
18 |
|
19 |
# BETA Quality, not sufficiently tested yet.
|
20 |
|
|
|
|
|
|
|
|
|
21 |
# Quant Infos
|
22 |
|
23 |
- quants done with an importance matrix for improved quantization loss
|
|
|
18 |
|
19 |
# BETA Quality, not sufficiently tested yet.
|
20 |
|
21 |
+
Seems to work mostly fine in my testing with the fixes from ggerganov's [PR](https://github.com/ggerganov/llama.cpp/pull/6851) applied, although it does seem to output extra <|end|> tokens at the end of responses when using llama.cpp's /v1/chat/completions endpoint
|
22 |
+
|
23 |
+
Additionally the chat template is not supported by llama.cpp yet so make sure to invoke it correctly, I made a PR for this [here](https://github.com/ggerganov/llama.cpp/pull/6857)
|
24 |
+
|
25 |
# Quant Infos
|
26 |
|
27 |
- quants done with an importance matrix for improved quantization loss
|