error
trying to run the boilerplate llama cpp code
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Traceback (most recent call last):
File "/data/text-generation-webui/models/mixtral/mixtral.py", line 4, in
llm = Llama(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 923, in init
self._n_vocab = self.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 2184, in n_vocab
return self._model.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 250, in n_vocab
assert self.model is not None
AssertionError
I'm getting a very similar error in Text Generation WebUI:
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 88, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 253, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 923, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 2184, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 250, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError
see the updated readme, you need to build from the mixtral branch
see the updated readme, you need to build from the mixtral branch
Sorry for the naive question but how would i do that? I am on linux, how can i replace the llama.cpp that is inside oobabooga to the mixtral branch you linked to?
see the updated readme, you need to build from the mixtral branch
And how exactly would you do that sir?
@RandomLegend
@DarkCoverUnleashed
sorry I'm not familiar with the structure of oobabooga, but if you have cloned llama.cpp, just cd into it and run git checkout mixtral
to switch to the right branch, then compile it as before.
@errata yeah i compiled the mixtral branch and use it already in the terminal. Fascinating model. But i have no idea how to get it running on oobabooga, ollama or gpt4all :-D well i have to wait until they publish the patches then.
Thanks!
@RandomLegend
Same here, I can use it on command line with that mixtral branch, LM Studio has it built in now I think, but I think they have to merge this pull request for it to get into anywhere "stable":
https://github.com/ggerganov/llama.cpp/pull/4406
So all we need is 1 review ;)
@mclassHF2023 Yeah, seemed like it was further away than one merge... Doesn't change anything as long as oobabooga doesn't merge it too
Did anyone got it working? I have done this:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python !pip install huggingface-hub !huggingface-cli download TheBloke/Mixtral-8x7B-v0.1-GGUF mixtral-8x7b-v0.1.Q4_K_M.gguf --local-dir /content/Models --local-dir-use-symlinks False from llama_cpp import Llama llm = Llama( model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf", # Download the model file first n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available )
Still getting this error:
```
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
AssertionError Traceback (most recent call last)
in <cell line: 1>()
----> 1 llm = Llama(
2 model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf", # Download the model file first
3 n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
4 n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
5 n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in n_vocab(self)
65
66 def n_vocab(self) -> int:
---> 67 assert self.model is not None
68 return llama_cpp.llama_n_vocab(self.model)
69
AssertionError:
```