README.md · TheBloke/Llama-2-70B-Chat-GGML at 5f8a081572110276f9c100db0fb6d286eee967c2

This model is still uploading. README will be here shortly.

If you're too impatient to wait for that (of course you are), to run these files you need:

llama.cpp as of this commit: https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb
To add new command line parameter -gqa 8

Example command:

/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"

There is no CUDA support at this time, but it should hopefully be coming soon.