fun-image-caption / README.md
krsnewwave's picture
Update README.md
e55e742 verified

A newer version of the Gradio SDK is available: 5.32.1

Upgrade
metadata
title: Funny Image Captioner
emoji: 🚀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: true
short_description: App that gives funny descriptions of images

Fun Image Caption

A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.

Description

This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.

Features

  • Upload any image for captioning
  • Choose from multiple voice personas:
    • Scurvy-ridden pirate
    • Forgetful wizard
    • Sarcastic teenager
  • Two-step LangGraph workflow:
    • Image captioning with vision-language model
    • Creative voice-based description
  • Built on efficient 4-bit quantized models for ZeroGPU environments

Useful Poetry Commands

  • Show all installed packages: poetry show
  • Show detailed info about a specific package: poetry show <package>
  • Show package location and details: poetry show -v <package>
  • List virtual environments: poetry env list
  • Show current environment info: poetry env info
  • Export dependencies to requirements.txt: uv pip compile pyproject.toml -o requirements.txt

Requirements

  • Python 3.10+
  • Poetry (Python package manager)
  • Git
  • CUDA-compatible GPU

Installation

  1. Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
  1. Clone the repository:
git clone https://github.com/yourusername/fun-image-caption.git
cd fun-image-caption
  1. Create and activate a new Poetry environment:
poetry env use python3.10
poetry shell
  1. Install dependencies:
poetry install
  1. Verify installation:
poetry show

Install Huggingface hub for CLI commands

pip install huggingface_hub

huggingface-cli login

Key Dependencies

  • accelerate==1.2.1: Framework for efficient model deployment
  • bitsandbytes==0.41.3.post2: Quantization library for model optimization
  • torch==2.4.0: PyTorch for ML operations
  • transformers==4.49.0: Hugging Face transformers library
  • gradio: Web interface framework
  • langgraph: Workflow orchestration for language model pipelines
  • pillow: Python Imaging Library

Usage

  1. Run the application:
python app.py
  1. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)

  2. Upload an image using the interface

  3. Select a voice persona from the dropdown menu

  4. Click "Generate Description" to see the results

  5. Enjoy your image description in the selected character voice!

Models

The application uses the following models:

  • Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
  • Voice Description: google/gemma-3-12b (4-bit quantized)

Author

[Your name and contact information]

License

[License information to be added]