error

by LaferriereJC - opened Dec 12, 2023

Dec 12, 2023

trying to run the boilerplate llama cpp code

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Traceback (most recent call last):
File "/data/text-generation-webui/models/mixtral/mixtral.py", line 4, in
llm = Llama(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 923, in init
self._n_vocab = self.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 2184, in n_vocab
return self._model.n_vocab()
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 250, in n_vocab
assert self.model is not None
AssertionError

mclassHF2023

Dec 12, 2023

I'm getting a very similar error in Text Generation WebUI:

File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper


shared.model, shared.tokenizer = load_model(shared.model_name, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 88, in load_model


output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\models.py", line 253, in llamacpp_loader


model, tokenizer = LlamaCppModel.from_pretrained(model_file)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained


result.model = Llama(**params)

               ^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 923, in init


self._n_vocab = self.n_vocab()

                ^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 2184, in n_vocab


return self._model.n_vocab()

       ^^^^^^^^^^^^^^^^^^^^^
File "C:\dev\llamaindex_text_generation_webui\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py", line 250, in n_vocab


assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

errata

Dec 12, 2023

see the updated readme, you need to build from the mixtral branch

RandomLegend

Dec 12, 2023

see the updated readme, you need to build from the mixtral branch

Sorry for the naive question but how would i do that? I am on linux, how can i replace the llama.cpp that is inside oobabooga to the mixtral branch you linked to?

DarkCoverUnleashed

Dec 12, 2023

see the updated readme, you need to build from the mixtral branch

And how exactly would you do that sir?

errata

Dec 12, 2023

@RandomLegend @DarkCoverUnleashed sorry I'm not familiar with the structure of oobabooga, but if you have cloned llama.cpp, just cd into it and run git checkout mixtral to switch to the right branch, then compile it as before.

RandomLegend

Dec 12, 2023

@errata yeah i compiled the mixtral branch and use it already in the terminal. Fascinating model. But i have no idea how to get it running on oobabooga, ollama or gpt4all :-D well i have to wait until they publish the patches then.

Thanks!

mclassHF2023

Dec 12, 2023

@RandomLegend Same here, I can use it on command line with that mixtral branch, LM Studio has it built in now I think, but I think they have to merge this pull request for it to get into anywhere "stable":
https://github.com/ggerganov/llama.cpp/pull/4406
So all we need is 1 review ;)

Nicoolodion

Dec 13, 2023

@mclassHF2023 Yeah, seemed like it was further away than one merge... Doesn't change anything as long as oobabooga doesn't merge it too

Soumadip

Jan 27, 2024

Did anyone got it working? I have done this:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python !pip install huggingface-hub !huggingface-cli download TheBloke/Mixtral-8x7B-v0.1-GGUF mixtral-8x7b-v0.1.Q4_K_M.gguf --local-dir /content/Models --local-dir-use-symlinks False from llama_cpp import Llama llm = Llama( model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf", # Download the model file first n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available )
Still getting this error:
```
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

AssertionError Traceback (most recent call last)
in <cell line: 1>()
----> 1 llm = Llama(
2 model_path="/content/Models/mixtral-8x7b-v0.1.Q4_K_M.gguf", # Download the model file first
3 n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources
4 n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
5 n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available

2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in n_vocab(self)
65
66 def n_vocab(self) -> int:
---> 67 assert self.model is not None
68 return llama_cpp.llama_n_vocab(self.model)
69

AssertionError:
```

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment