File size: 730 Bytes
9fd74ab
 
 
 
 
 
 
c0fe9b2
 
 
 
4c2da4f
 
c0fe9b2
 
 
 
841c2c6
c0fe9b2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Quantized version of this: https://huggingface.co/TheBloke/stable-vicuna-13B-HF

Big thank you to TheBloke for uploading the HF version above.  Unfortunately, his GPTQ quant doesn't run on 0cc4m's fork of KAI/GPTQ so I am uploading one that does.

GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI.

Command used to quantize:  
```python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```

This model works best with the following prompting.  Also, it really does not like to stop on its own and will likely keep going on forever if you let it.

```
### Human:
What is 2+2?

### Assistant:


```