Blip Image Captioning Base BF16

This model is a quantized version of the Salesforce/blip-image-captioning-base, an image-to-text model. From a memory footprint of 989 MBs -> 494 MBs by quantizing the percision of float32 to bfloat 16, reducing the model's memory size by 50 percent.

Example

a cat sitting on top of a purple and red striped carpet

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import BlipForConditionalGeneration, BlipProcessor
import requests
from PIL import Image

model = BlipForConditionalGeneration.from_pretrained("gospacedev/blip-image-captioning-base-bf16")
processor = BlipProcessor.from_pretrained("gospacedev/blip-image-captioning-base-bf16")

# Load sample image
image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# Generate output
inputs = processor(image, return_tensors="pt")
output = model.generate(**inputs)
result = processor.decode(out[0], skip_special_tokens=True)

print(results)

Model Details

  • Developed by: Grantley Cullar
  • Model type: Image-to-Text
  • Language(s) (NLP): English
  • License: MIT License
Downloads last month
21
Safetensors
Model size
247M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.