TheBloke's picture
Create README.md
5f8a081
|
raw
history blame
611 Bytes

This model is still uploading. README will be here shortly.

If you're too impatient to wait for that (of course you are), to run these files you need:

  1. llama.cpp as of this commit: https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb
  2. To add new command line parameter -gqa 8

Example command:

/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"

There is no CUDA support at this time, but it should hopefully be coming soon.