Help needed to load model
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
n_gpu_layers = 40 # Change this value based on your model and your GPU VRAM pool.
n_batch = 256 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
Loading model,
llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=False,
)
ValidationError: 1 validation error for LlamaCpp
root
Could not load Llama model from path: /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin. Received error fileno (type=value_error)
same problem
Same problem :(
llama.cpp and llama-cpp-python only support GGUF (not GGML) after a certain version - so try this
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python==0.1.78 --no-cache-dir
I will be making GGUFs for these models tonight, so they're coming very soon
@actionpace
tried !CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python==0.1.78 --no-cache-dir
with the same result :(
So we will have to wait for the GGUFs versions :)
Have you tried my version in my repo?
yup @akarshanbiswas same result
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in __init__(__pydantic_self__, **data)
339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
340 if validation_error:
--> 341 raise validation_error
342 try:
343 object_setattr(__pydantic_self__, '__dict__', values)
ValidationError: 1 validation error for LlamaCpp
__root__
Could not load Llama model from path: /root/.cache/huggingface/hub/models--akarshanbiswas--llama-2-chat-13b-gguf/snapshots/141acdcfecba05f5c0e046ee0339863fc9621004/ggml-llama-2-13b-chat-q4_k_m.gguf. Received error fileno (type=value_error)
I just do
!pip install llama-cpp-python
and then
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
also tried with
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip -qq install --upgrade --force-reinstall llama-cpp-python==0.1.78 --no-cache-dir
model_name_or_path = "akarshanbiswas/llama-2-chat-13b-gguf"
model_basename = "ggml-llama-2-13b-chat-q4_k_m.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
n_gpu_layers = 40
n_batch = 256
# Loading model,
llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=False,
)
Try downloading it using browser. Save it to a location and pass the file path to the class
Same result on collab sorry :(
Try with:
curl -OL https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q5_1.bin
Oh, I see, you need the GGUF version
I have the same problem and couldn't find any solution yet
Fix for "Could not load Llama model from path":
Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF
Code Example:
model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
Then Change "verbose=False" to "verbose=True" like the following code:
llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)
Please @TheBloke , is there GGUF for 7B-Chat yet? I can't seem to find one.
Thank you, @TheBloke
Fix for "Could not load Llama model from path":
Download GGUF model from this link:
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUFCode Example:
model_name_or_path = "TheBloke/CodeLlama-13B-Python-GGUF"
model_basename = "codellama-13b-python.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)Then Change "verbose=False" to "verbose=True" like the following code:
llm = LlamaCpp(
model_path=model_path,
max_tokens=256,
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
callback_manager=callback_manager,
n_ctx=1024,
verbose=True,
)
Thank you. This worked for me. Any ideas why this might be the case?