Compatibility with llama-cpp and Ollama

#17
by liashchynskyi - opened

Hi there!

I've tried some quantized versions of this model and ran into an issue. I use llama-cpp-python for model inference. When I provide a question, I get infinite random characters as the result (see screenshot). But when I create a local model from the same quantized gguf by using Modelfile for Ollama inference, then everything works fine. So the issue is that Ollama works, and llama-cpp-python provides random output. The same behavior was noticed with a couple other models, like defog/llama-3-sqlcoder-8b.

Is anyone here experiencing same issues?

image.png

llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:

Hi there!

I've tried some quantized versions of this model and ran into an issue. I use llama-cpp-python for model inference. When I provide a question, I get infinite random characters as the result (see screenshot). But when I create a local model from the same quantized gguf by using Modelfile for Ollama inference, then everything works fine. So the issue is that Ollama works, and llama-cpp-python provides random output. The same behavior was noticed with a couple other models, like defog/llama-3-sqlcoder-8b.

Is anyone here experiencing same issues?

image.png

@liashchynskyi Yes! I'm having the same issue with defog/llama-3-sqlcoder-8b. I'm using LangChain with llama-cpp-python - only GGUF models. I'm looking to use GGUF files others have created - I can look into generating my own if that's the only solution.

Output from defog/llama-3-sqlcoder-8b:

image.png

Quant Factory org
edited Jun 7, 2024

@jaycann2 can you try QuantFactory/Meta-Llama-3-8B-Instruct-GGUF-v2
I'll update the defog quants today if you are facing issues with them

@munish0838 But why do we receive random outputs? I've tried to quantize the original model myself and ran into the same issue.

@jaycann2 can you try QuantFactory/Meta-Llama-3-8B-Instruct-GGUF-v2
I'll update the defog quants today if you are facing issues with them

Thanks @munish0838 - I tried yesterday and got the same result. I'd be interested to see if you are able to duplicate the issue on you end, with the GGUF version. If not, I might be able to learn what's going on from your code.

Quant Factory org

@jaycann2 I updated the quants yesterday in this repo and defog-sql-llama repo, they are both working perfectly for me

Sign up or log in to comment