|
Quantized version of this: https://huggingface.co/TheBloke/stable-vicuna-13B-HF |
|
|
|
Big thank you to TheBloke for uploading the HF version above. Unfortunately, his GPTQ quant doesn't run on 0cc4m's fork of KAI/GPTQ so I am uploading one that does. |
|
|
|
GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI. |
|
|
|
Command used to quantize: |
|
```python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors``` |
|
|
|
This model works best with the following prompting. Also, it really does not like to stop on its own and will likely keep going on forever if you let it. |
|
|
|
``` |
|
### Human: |
|
What is 2+2? |
|
|
|
### Assistant: |
|
|
|
|
|
``` |