Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,10 @@ Can I ask a question?<|im_end|>
|
|
29 |
|
30 |
## Support
|
31 |
|
|
|
|
|
|
|
|
|
32 |
To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
|
33 |
|
34 |
However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
|
@@ -38,7 +42,9 @@ To create a working GGUF file, make the following adjustments:
|
|
38 |
1. Remove the `"rope_scaling": {}` entry from `config.json`
|
39 |
2. Change `"max_position_embeddings"` to `8192` in `config.json`
|
40 |
|
41 |
-
These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation
|
|
|
|
|
42 |
|
43 |
## axolotl config
|
44 |
|
|
|
29 |
|
30 |
## Support
|
31 |
|
32 |
+
Upstream support has been merged, so gguf-quants work out of the box now!
|
33 |
+
|
34 |
+
<details><summary>old instructions before PR</summary>
|
35 |
+
|
36 |
To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
|
37 |
|
38 |
However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
|
|
|
42 |
1. Remove the `"rope_scaling": {}` entry from `config.json`
|
43 |
2. Change `"max_position_embeddings"` to `8192` in `config.json`
|
44 |
|
45 |
+
These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.</strike>
|
46 |
+
|
47 |
+
</details><br>
|
48 |
|
49 |
## axolotl config
|
50 |
|