benk04
/

CausalLM-RP-34B-4.65bpw-h6-exl2

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Exllamav2 4.65bpw quantization of CausalLM-RP-34B from NeverSleep, quantized with default calibration dataset.

Fits in 24GB VRAM with 32k+ context. Make sure to enable 4-bit cache option or you'll run into OOM errors.

Original Card

Description

This repo contains fp16 files of CausalLM-RP-34B, a finetuned model of the CausalLM-34B Beta on multiple RP datasets.

Model used

CausalLM/34b-beta

Prompt template ChatML

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
{output}<|im_end|>

Downloads last month: 17

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.