docker pull ghcr.io/ggerganov/llama.cpp:server

Assuming mistral-7B-instruct-v0.2-q8.gguf file is downloaded to /path/to/models directory on local machine, run the container accesing the model with:

docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/istral-7B-instruct-v0.2-q8.gguf --port 8000 --host 0.0.0.0 -n 512
  • Test the deployment accessing the model with the browser at http://localhost:8000
  • llama.cpp server also provides OpenAI compatible API
  • Deployment on CUDA GPU:
docker pull ghcr.io/ggerganov/llama.cpp:server-cuda
docker run --gpus all -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server-cuda -m /models/mistral-7B-instruct-v0.2-q8.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 50
Downloads last month
3
GGUF
Model size
7.24B params
Architecture
llama
Inference API
Unable to determine this model's library. Check the docs .