ksukrit
/

qwen2-vl-2b-4bit

4-bit precision

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen2-VL-2B-Instruct 4-bit Quantized

This is a 4-bit quantized version of the Qwen2-VL-2B-Instruct model.

Model Description

Original Model: Qwen/Qwen2-VL-2B-Instruct
Quantization: 4-bit quantization using bitsandbytes
Usage: This model is optimized for memory efficiency while maintaining performance
License: Same as original model

Usage

from transformers import Qwen2VLModel, AutoTokenizer
import torch

model = Qwen2VLModel.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True)

Quantization Details

Quantization Method: bitsandbytes 4-bit quantization
Compute dtype: float16
Uses double quantization: True
Quantization type: nf4

Downloads last month: 10

Safetensors

Model size

909M params

Tensor type

F32

·

FP16

·

U8

·

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.