fun-image-caption / README.md
krsnewwave's picture
Update README.md
e55e742 verified
---
title: Funny Image Captioner
emoji: 🚀
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.22.0
app_file: app.py
pinned: true
short_description: App that gives funny descriptions of images
---
# Fun Image Caption
A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.
## Description
This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.
## Features
- Upload any image for captioning
- Choose from multiple voice personas:
- Scurvy-ridden pirate
- Forgetful wizard
- Sarcastic teenager
- Two-step LangGraph workflow:
- Image captioning with vision-language model
- Creative voice-based description
- Built on efficient 4-bit quantized models for ZeroGPU environments
## Useful Poetry Commands
- Show all installed packages: `poetry show`
- Show detailed info about a specific package: `poetry show <package>`
- Show package location and details: `poetry show -v <package>`
- List virtual environments: `poetry env list`
- Show current environment info: `poetry env info`
- Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt`
## Requirements
- Python 3.10+
- Poetry (Python package manager)
- Git
- CUDA-compatible GPU
## Installation
1. Install Poetry if you haven't already:
```bash
curl -sSL https://install.python-poetry.org | python3 -
```
2. Clone the repository:
```bash
git clone https://github.com/yourusername/fun-image-caption.git
cd fun-image-caption
```
3. Create and activate a new Poetry environment:
```bash
poetry env use python3.10
poetry shell
```
4. Install dependencies:
```bash
poetry install
```
5. Verify installation:
```bash
poetry show
```
## Install Huggingface hub for CLI commands
```bash
pip install huggingface_hub
huggingface-cli login
```
## Key Dependencies
- accelerate==1.2.1: Framework for efficient model deployment
- bitsandbytes==0.41.3.post2: Quantization library for model optimization
- torch==2.4.0: PyTorch for ML operations
- transformers==4.49.0: Hugging Face transformers library
- gradio: Web interface framework
- langgraph: Workflow orchestration for language model pipelines
- pillow: Python Imaging Library
## Usage
1. Run the application:
```bash
python app.py
```
2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)
3. Upload an image using the interface
4. Select a voice persona from the dropdown menu
5. Click "Generate Description" to see the results
6. Enjoy your image description in the selected character voice!
## Models
The application uses the following models:
- Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
- Voice Description: google/gemma-3-12b (4-bit quantized)
## Author
[Your name and contact information]
## License
[License information to be added]