Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ Can I ask a question?<|im_end|>
|
|
29 |
|
30 |
## Support
|
31 |
|
32 |
-
To run inference on this model, you'll need to use Aphrodite or
|
33 |
|
34 |
However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
|
35 |
|
|
|
29 |
|
30 |
## Support
|
31 |
|
32 |
+
To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
|
33 |
|
34 |
However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
|
35 |
|