metadata

title: Funny Image Captioner
emoji: 🚀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: true
short_description: App that gives funny descriptions of images

Fun Image Caption

A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.

Description

This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.

Features

Upload any image for captioning
Choose from multiple voice personas:
- Scurvy-ridden pirate
- Forgetful wizard
- Sarcastic teenager
Two-step LangGraph workflow:
- Image captioning with vision-language model
- Creative voice-based description
Built on efficient 4-bit quantized models for ZeroGPU environments

Useful Poetry Commands

Show all installed packages: poetry show
Show detailed info about a specific package: poetry show <package>
Show package location and details: poetry show -v <package>
List virtual environments: poetry env list
Show current environment info: poetry env info
Export dependencies to requirements.txt: uv pip compile pyproject.toml -o requirements.txt

Requirements

Python 3.10+
Poetry (Python package manager)
Git
CUDA-compatible GPU

Installation

Install Poetry if you haven't already:

curl -sSL https://install.python-poetry.org | python3 -

Clone the repository:

git clone https://github.com/yourusername/fun-image-caption.git
cd fun-image-caption

Create and activate a new Poetry environment:

poetry env use python3.10
poetry shell

Install dependencies:

poetry install

Verify installation:

poetry show

Install Huggingface hub for CLI commands

pip install huggingface_hub

huggingface-cli login

Key Dependencies

accelerate==1.2.1: Framework for efficient model deployment
bitsandbytes==0.41.3.post2: Quantization library for model optimization
torch==2.4.0: PyTorch for ML operations
transformers==4.49.0: Hugging Face transformers library
gradio: Web interface framework
langgraph: Workflow orchestration for language model pipelines
pillow: Python Imaging Library

Usage

Run the application:

python app.py

Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)
Upload an image using the interface
Select a voice persona from the dropdown menu
Click "Generate Description" to see the results
Enjoy your image description in the selected character voice!

Models

The application uses the following models:

Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
Voice Description: google/gemma-3-12b (4-bit quantized)

Author

[Your name and contact information]

License

[License information to be added]