This model is a finetuned version of the DeciLM-6b-instruct on the Dolphin GPT4 Dataset
Please set naive_attention_prefill to true when loading this model.
Example:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer
model_name = "NewstaR/Porpoise-6b-instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True,
naive_attention_prefill=True,
)
model.config.use_cache = False
- Downloads last month
- 2,154
Inference API (serverless) does not yet support model repos that contain custom code.