--- title: Fun Image Caption emoji: 🚀 colorFrom: pink colorTo: gray sdk: gradio sdk_version: 5.22.0 app_file: app.py pinned: false short_description: App that gives funny descriptions of images --- # Fun Image Caption A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models. ## Description This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun. ## Features - Upload any image for captioning - Choose from multiple voice personas: - Scurvy-ridden pirate - Forgetful wizard - Sarcastic teenager - Two-step LangGraph workflow: - Image captioning with vision-language model - Creative voice-based description - Built on efficient 4-bit quantized models for ZeroGPU environments ## Useful Poetry Commands - Show all installed packages: `poetry show` - Show detailed info about a specific package: `poetry show ` - Show package location and details: `poetry show -v ` - List virtual environments: `poetry env list` - Show current environment info: `poetry env info` - Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt` ## Requirements - Python 3.10+ - Poetry (Python package manager) - Git - CUDA-compatible GPU ## Installation 1. Install Poetry if you haven't already: ```bash curl -sSL https://install.python-poetry.org | python3 - ``` 2. Clone the repository: ```bash git clone https://github.com/yourusername/fun-image-caption.git cd fun-image-caption ``` 3. Create and activate a new Poetry environment: ```bash poetry env use python3.10 poetry shell ``` 4. Install dependencies: ```bash poetry install ``` 5. Verify installation: ```bash poetry show ``` ## Install Huggingface hub for CLI commands ```bash pip install huggingface_hub huggingface-cli login ``` ## Key Dependencies - accelerate==1.2.1: Framework for efficient model deployment - bitsandbytes==0.41.3.post2: Quantization library for model optimization - torch==2.4.0: PyTorch for ML operations - transformers==4.49.0: Hugging Face transformers library - gradio: Web interface framework - langgraph: Workflow orchestration for language model pipelines - pillow: Python Imaging Library ## Usage 1. Run the application: ```bash python app.py ``` 2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860) 3. Upload an image using the interface 4. Select a voice persona from the dropdown menu 5. Click "Generate Description" to see the results 6. Enjoy your image description in the selected character voice! ## Models The application uses the following models: - Image Captioning: google/gemma-3-12b-vision (4-bit quantized) - Voice Description: google/gemma-3-12b (4-bit quantized) ## Author [Your name and contact information] ## License [License information to be added]