Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.32.1
title: Funny Image Captioner
emoji: 🚀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: true
short_description: App that gives funny descriptions of images
Fun Image Caption
A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.
Description
This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.
Features
- Upload any image for captioning
- Choose from multiple voice personas:
- Scurvy-ridden pirate
- Forgetful wizard
- Sarcastic teenager
- Two-step LangGraph workflow:
- Image captioning with vision-language model
- Creative voice-based description
- Built on efficient 4-bit quantized models for ZeroGPU environments
Useful Poetry Commands
- Show all installed packages:
poetry show
- Show detailed info about a specific package:
poetry show <package>
- Show package location and details:
poetry show -v <package>
- List virtual environments:
poetry env list
- Show current environment info:
poetry env info
- Export dependencies to requirements.txt:
uv pip compile pyproject.toml -o requirements.txt
Requirements
- Python 3.10+
- Poetry (Python package manager)
- Git
- CUDA-compatible GPU
Installation
- Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
- Clone the repository:
git clone https://github.com/yourusername/fun-image-caption.git
cd fun-image-caption
- Create and activate a new Poetry environment:
poetry env use python3.10
poetry shell
- Install dependencies:
poetry install
- Verify installation:
poetry show
Install Huggingface hub for CLI commands
pip install huggingface_hub
huggingface-cli login
Key Dependencies
- accelerate==1.2.1: Framework for efficient model deployment
- bitsandbytes==0.41.3.post2: Quantization library for model optimization
- torch==2.4.0: PyTorch for ML operations
- transformers==4.49.0: Hugging Face transformers library
- gradio: Web interface framework
- langgraph: Workflow orchestration for language model pipelines
- pillow: Python Imaging Library
Usage
- Run the application:
python app.py
Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)
Upload an image using the interface
Select a voice persona from the dropdown menu
Click "Generate Description" to see the results
Enjoy your image description in the selected character voice!
Models
The application uses the following models:
- Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
- Voice Description: google/gemma-3-12b (4-bit quantized)
Author
[Your name and contact information]
License
[License information to be added]