---
title: Fun Image Caption
emoji: 🚀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: false
short_description: App that gives funny descriptions of images
---

# Fun Image Caption

A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.

## Description

This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.

## Features

- Upload any image for captioning
- Choose from multiple voice personas:
  - Scurvy-ridden pirate
  - Forgetful wizard
  - Sarcastic teenager
- Two-step LangGraph workflow:
  - Image captioning with vision-language model
  - Creative voice-based description
- Built on efficient 4-bit quantized models for ZeroGPU environments

## Useful Poetry Commands

- Show all installed packages: `poetry show`
- Show detailed info about a specific package: `poetry show <package>`
- Show package location and details: `poetry show -v <package>`
- List virtual environments: `poetry env list`
- Show current environment info: `poetry env info`
- Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt`

## Requirements

- Python 3.10+
- Poetry (Python package manager)
- Git
- CUDA-compatible GPU

## Installation

1. Install Poetry if you haven't already:
```bash
curl -sSL https://install.python-poetry.org | python3 -
```

2. Clone the repository:
```bash
git clone https://github.com/yourusername/fun-image-caption.git
cd fun-image-caption
```

3. Create and activate a new Poetry environment:
```bash
poetry env use python3.10
poetry shell
```

4. Install dependencies:
```bash
poetry install
```

5. Verify installation:
```bash
poetry show
```

## Install Huggingface hub for CLI commands
```bash
pip install huggingface_hub

huggingface-cli login
```

## Key Dependencies

- accelerate==1.2.1: Framework for efficient model deployment
- bitsandbytes==0.41.3.post2: Quantization library for model optimization
- torch==2.4.0: PyTorch for ML operations
- transformers==4.49.0: Hugging Face transformers library
- gradio: Web interface framework
- langgraph: Workflow orchestration for language model pipelines
- pillow: Python Imaging Library

## Usage

1. Run the application:
```bash
python app.py
```

2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)

3. Upload an image using the interface

4. Select a voice persona from the dropdown menu

5. Click "Generate Description" to see the results

6. Enjoy your image description in the selected character voice!

## Models

The application uses the following models:
- Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
- Voice Description: google/gemma-3-12b (4-bit quantized)

## Author

[Your name and contact information]

## License

[License information to be added]