YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen2-VL-2B-Instruct 4-bit Quantized

This is a 4-bit quantized version of the Qwen2-VL-2B-Instruct model.

Model Description

  • Original Model: Qwen/Qwen2-VL-2B-Instruct
  • Quantization: 4-bit quantization using bitsandbytes
  • Usage: This model is optimized for memory efficiency while maintaining performance
  • License: Same as original model

Usage

from transformers import Qwen2VLModel, AutoTokenizer
import torch

model = Qwen2VLModel.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True)

Quantization Details

  • Quantization Method: bitsandbytes 4-bit quantization
  • Compute dtype: float16
  • Uses double quantization: True
  • Quantization type: nf4
Downloads last month
10
Safetensors
Model size
909M params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.