benk04
/

CausalLM-RP-34B-4.65bpw-h6-exl2

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

benk04 commited on Jun 6, 2024

Commit

9435b22

•

1 Parent(s): 962de4a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 <!-- description start -->
 Exllamav2 4.65bpw quantization of CausalLM-RP-34B from [NeverSleep](https://huggingface.co/NeverSleep/CausalLM-RP-34B), quantized with default calibration dataset.
 > [!IMPORTANT]
->This bpw is the perfect size for 24GB GPUs, and can fit 32k+ context. Make sure to enable 4-bit cache option or you'll run into OOM errors.
 ---
 ## Original Card

 <!-- description start -->
 Exllamav2 4.65bpw quantization of CausalLM-RP-34B from [NeverSleep](https://huggingface.co/NeverSleep/CausalLM-RP-34B), quantized with default calibration dataset.
 > [!IMPORTANT]
+>Fits in 24GB VRAM with 32k+ context. Make sure to enable 4-bit cache option or you'll run into OOM errors.
 ---
 ## Original Card