InterLM2-Chat NF4 Quant

Usage

As of 2024/1/17, Transformers must be installed from source and bitsandbytes >=0.42.0 is required in order to load serialized 4-bit quants.

pip install -U git+https://github.com/huggingface/transformers bitsandbytes

Quantization config

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

Not necessary for inference, just load the model without specifying any quantization/load_in_*bit.

Model Details

Downloads last month
16
Safetensors
Model size
10.8B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.