Run the GGUF model on CPU

These GGUF models are 8-bit quantization(q8_0)

llama3.1-q8_0.gguf is direct GGUF conversion of llama-3.1-8B model.

Mistral_7B_DPO_Pandas-q8_0.gguf appears to be a GGUF conversion of a fine-tuned version of the Mistral-7B model, specifically fine-tuned using Direct Preference Optimization (DPO) on a dataset related to Pandas.

pip install llama-cpp-python

Manually download the gguf model (eg: llama3.1-q8_0.gguf) from this repo.

from llama_cpp import Llama
llm = Llama(model_path="/your-path/model.gguf", chat_format="llama-2")

prompt = "explain the large language models"

output = llm(prompt, max_tokens=300)
print(output['choices'][0]['text'])
Downloads last month
2
GGUF
Model size
8.03B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nirusanan/llama-gguf

Quantized
(270)
this model