GGML 4-bit/5-bit quantized IDEA-CCNL/Ziya-LLaMA-13B-v1
- You need the latest version of llama-cpp or llama-cpp-python (to support ggml format v3).
- Currently llama-cmake can not tokenize '<human>', '<bot>' special tokens, I changed these to 🤖🧑 emojis.
- Promote like this:
inputs = '🧑:' + query.strip() + '\n🤖:'