lucyknada commited on
Commit
44e2612
1 Parent(s): d380b15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -29,6 +29,10 @@ Can I ask a question?<|im_end|>
29
 
30
  ## Support
31
 
 
 
 
 
32
  To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
33
 
34
  However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
@@ -38,7 +42,9 @@ To create a working GGUF file, make the following adjustments:
38
  1. Remove the `"rope_scaling": {}` entry from `config.json`
39
  2. Change `"max_position_embeddings"` to `8192` in `config.json`
40
 
41
- These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
 
 
42
 
43
  ## axolotl config
44
 
 
29
 
30
  ## Support
31
 
32
+ Upstream support has been merged, so gguf-quants work out of the box now!
33
+
34
+ <details><summary>old instructions before PR</summary>
35
+
36
  To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
37
 
38
  However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
 
42
  1. Remove the `"rope_scaling": {}` entry from `config.json`
43
  2. Change `"max_position_embeddings"` to `8192` in `config.json`
44
 
45
+ These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.</strike>
46
+
47
+ </details><br>
48
 
49
  ## axolotl config
50