Missing config.json
#1
by
denis-kazakov
- opened
I tried to use the model in a pipeline as shown in the model card (both from HF and a pre-downloaded local copy) but get this error message: LLama-3.1-KazLLM-1.0-8B-GGUF4 does not appear to have a file named config.json. Checkout 'https://huggingface.co//media/denis/D/Models/LLM/KazLLM/LLama-3.1-KazLLM-1.0-8B-GGUF4/tree/None' for available files.
Institute of Smart Systems and Artificial Intelligence, Nazarbayev University org
•
edited about 20 hours ago
Hello.
You can run like that using vllm. Not sure what the problem with pipeline.
1 cell.
# Setup env:
!conda create -n vllm_test python=3.10 -y
!pip install vllm==0.6.3
!pip install ipykernel
!python -m ipykernel install --user --name vllm_test
2 cell
# load model
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from vllm import LLM, SamplingParams
# In this script, we demonstrate how to pass input to the chat method:
conversation = [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
{
"role": "user",
"content": "Write an essay about the importance of higher education.",
},
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="/data/nvme5n1p1/vladimir_workspace/models/quantized/gguf/checkpoints_llama8b_031224_18900-gguf/checkpoints_llama8b_031224_18900-Q4_K_M.gguf",
gpu_memory_utilization=0.95)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(conversation, sampling_params)
3 cell
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt}, Generated text: {generated_text}")
Or you can also run using llama.cpp if you want, because vllm not yet fully optimized for gguf.