Spaces:
Running
on
Zero
Running
on
Zero
title: Funny Image Captioner | |
emoji: 🚀 | |
colorFrom: pink | |
colorTo: gray | |
sdk: gradio | |
sdk_version: 5.22.0 | |
app_file: app.py | |
pinned: true | |
short_description: App that gives funny descriptions of images | |
# Fun Image Caption | |
A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models. | |
## Description | |
This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun. | |
## Features | |
- Upload any image for captioning | |
- Choose from multiple voice personas: | |
- Scurvy-ridden pirate | |
- Forgetful wizard | |
- Sarcastic teenager | |
- Two-step LangGraph workflow: | |
- Image captioning with vision-language model | |
- Creative voice-based description | |
- Built on efficient 4-bit quantized models for ZeroGPU environments | |
## Useful Poetry Commands | |
- Show all installed packages: `poetry show` | |
- Show detailed info about a specific package: `poetry show <package>` | |
- Show package location and details: `poetry show -v <package>` | |
- List virtual environments: `poetry env list` | |
- Show current environment info: `poetry env info` | |
- Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt` | |
## Requirements | |
- Python 3.10+ | |
- Poetry (Python package manager) | |
- Git | |
- CUDA-compatible GPU | |
## Installation | |
1. Install Poetry if you haven't already: | |
```bash | |
curl -sSL https://install.python-poetry.org | python3 - | |
``` | |
2. Clone the repository: | |
```bash | |
git clone https://github.com/yourusername/fun-image-caption.git | |
cd fun-image-caption | |
``` | |
3. Create and activate a new Poetry environment: | |
```bash | |
poetry env use python3.10 | |
poetry shell | |
``` | |
4. Install dependencies: | |
```bash | |
poetry install | |
``` | |
5. Verify installation: | |
```bash | |
poetry show | |
``` | |
## Install Huggingface hub for CLI commands | |
```bash | |
pip install huggingface_hub | |
huggingface-cli login | |
``` | |
## Key Dependencies | |
- accelerate==1.2.1: Framework for efficient model deployment | |
- bitsandbytes==0.41.3.post2: Quantization library for model optimization | |
- torch==2.4.0: PyTorch for ML operations | |
- transformers==4.49.0: Hugging Face transformers library | |
- gradio: Web interface framework | |
- langgraph: Workflow orchestration for language model pipelines | |
- pillow: Python Imaging Library | |
## Usage | |
1. Run the application: | |
```bash | |
python app.py | |
``` | |
2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860) | |
3. Upload an image using the interface | |
4. Select a voice persona from the dropdown menu | |
5. Click "Generate Description" to see the results | |
6. Enjoy your image description in the selected character voice! | |
## Models | |
The application uses the following models: | |
- Image Captioning: google/gemma-3-12b-vision (4-bit quantized) | |
- Voice Description: google/gemma-3-12b (4-bit quantized) | |
## Author | |
[Your name and contact information] | |
## License | |
[License information to be added] | |