CPU Inference

#13
by Ange09 - opened

Hello TheBloke,
Is there any way to perform inference on CPU with the model?
Thank you very much.

Technically yes you can run GPTQ on CPU but it's horribly slow.

If you want CPU only inference, use the GGML versions found in https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment