Spaces:

krsnewwave
/

fun-image-caption

Sleeping

App Files Files Community

Dylan commited on Mar 23

Commit

98efca2

1 Parent(s): e82b768

added initial files

Browse files

Files changed (4) hide show

README.md +105 -1
poetry.lock +0 -0
pyproject.toml +23 -0
requirements.txt +241 -0

README.md CHANGED Viewed

@@ -10,4 +10,108 @@ pinned: false
 short_description: App that gives funny descriptions of images
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: App that gives funny descriptions of images
 ---
+# Fun Image Caption
+A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.
+## Description
+This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.
+## Features
+- Upload any image for captioning
+- Choose from multiple voice personas:
+  - Scurvy-ridden pirate
+  - Forgetful wizard
+  - Sarcastic teenager
+- Two-step LangGraph workflow:
+  - Image captioning with vision-language model
+  - Creative voice-based description
+- Built on efficient 4-bit quantized models for ZeroGPU environments
+## Useful Poetry Commands
+- Show all installed packages: `poetry show`
+- Show detailed info about a specific package: `poetry show <package>`
+- Show package location and details: `poetry show -v <package>`
+- List virtual environments: `poetry env list`
+- Show current environment info: `poetry env info`
+- Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt`
+## Requirements
+- Python 3.10+
+- Poetry (Python package manager)
+- Git
+- CUDA-compatible GPU
+## Installation
+1. Install Poetry if you haven't already:
+```bash
+curl -sSL https://install.python-poetry.org | python3 -
+```
+2. Clone the repository:
+```bash
+git clone https://github.com/yourusername/fun-image-caption.git
+cd fun-image-caption
+```
+3. Create and activate a new Poetry environment:
+```bash
+poetry env use python3.10
+poetry shell
+```
+4. Install dependencies:
+```bash
+poetry install
+```
+5. Verify installation:
+```bash
+poetry show
+```
+## Key Dependencies
+- accelerate==1.2.1: Framework for efficient model deployment
+- bitsandbytes==0.41.3.post2: Quantization library for model optimization
+- torch==2.4.0: PyTorch for ML operations
+- transformers==4.49.0: Hugging Face transformers library
+- gradio: Web interface framework
+- langgraph: Workflow orchestration for language model pipelines
+- pillow: Python Imaging Library
+## Usage
+1. Run the application:
+```bash
+python app.py
+```
+2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)
+3. Upload an image using the interface
+4. Select a voice persona from the dropdown menu
+5. Click "Generate Description" to see the results
+6. Enjoy your image description in the selected character voice!
+## Models
+The application uses the following models:
+- Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
+- Voice Description: google/gemma-3-12b (4-bit quantized)
+## Author
+[Your name and contact information]
+## License
+[License information to be added]

poetry.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,23 @@

+[project]
+name = "fun-image-caption"
+version = "0.1.0"
+description = "This Gradio app processes images and provides descriptions in different voice personas using a LangGraph workflow."
+authors = [
+    {name = "Dylan",email = "[email protected]"}
+]
+readme = "README.md"
+requires-python = "==3.10.13"
+dependencies = [
+    "langgraph (>=0.3.18,<0.4.0)",
+    "pillow (>=11.1.0,<12.0.0)",
+    "gradio (>=5.22.0,<6.0.0)",
+    "transformers (==4.49.0)",
+    "torch (==2.4.0)",
+    "bitsandbytes (~=0.41.3)",
+    "accelerate (==1.2.1)",
+]
+[build-system]
+requires = ["poetry-core>=2.0.0,<3.0.0"]
+build-backend = "poetry.core.masonry.api"

requirements.txt ADDED Viewed

	@@ -0,0 +1,241 @@

+# This file was autogenerated by uv via the following command:
+#    uv pip compile pyproject.toml -o requirements.txt
+accelerate==1.2.1
+    # via fun-image-caption (pyproject.toml)
+aiofiles==23.2.1
+    # via gradio
+annotated-types==0.7.0
+    # via pydantic
+anyio==4.9.0
+    # via
+    #   gradio
+    #   httpx
+    #   starlette
+bitsandbytes==0.41.3.post2
+    # via fun-image-caption (pyproject.toml)
+certifi==2025.1.31
+    # via
+    #   httpcore
+    #   httpx
+    #   requests
+charset-normalizer==3.4.1
+    # via requests
+click==8.1.8
+    # via
+    #   typer
+    #   uvicorn
+exceptiongroup==1.2.2
+    # via anyio
+fastapi==0.115.11
+    # via gradio
+ffmpy==0.5.0
+    # via gradio
+filelock==3.18.0
+    # via
+    #   huggingface-hub
+    #   torch
+    #   transformers
+fsspec==2025.3.0
+    # via
+    #   gradio-client
+    #   huggingface-hub
+    #   torch
+gradio==5.22.0
+    # via fun-image-caption (pyproject.toml)
+gradio-client==1.8.0
+    # via gradio
+groovy==0.1.2
+    # via gradio
+h11==0.14.0
+    # via
+    #   httpcore
+    #   uvicorn
+httpcore==1.0.7
+    # via httpx
+httpx==0.28.1
+    # via
+    #   gradio
+    #   gradio-client
+    #   langgraph-sdk
+    #   langsmith
+    #   safehttpx
+huggingface-hub==0.29.3
+    # via
+    #   accelerate
+    #   gradio
+    #   gradio-client
+    #   tokenizers
+    #   transformers
+idna==3.10
+    # via
+    #   anyio
+    #   httpx
+    #   requests
+jinja2==3.1.6
+    # via
+    #   gradio
+    #   torch
+jsonpatch==1.33
+    # via langchain-core
+jsonpointer==3.0.0
+    # via jsonpatch
+langchain-core==0.3.47
+    # via
+    #   langgraph
+    #   langgraph-checkpoint
+    #   langgraph-prebuilt
+langgraph==0.3.18
+    # via fun-image-caption (pyproject.toml)
+langgraph-checkpoint==2.0.21
+    # via
+    #   langgraph
+    #   langgraph-prebuilt
+langgraph-prebuilt==0.1.4
+    # via langgraph
+langgraph-sdk==0.1.58
+    # via langgraph
+langsmith==0.3.18
+    # via langchain-core
+markdown-it-py==3.0.0
+    # via rich
+markupsafe==3.0.2
+    # via
+    #   gradio
+    #   jinja2
+mdurl==0.1.2
+    # via markdown-it-py
+mpmath==1.3.0
+    # via sympy
+msgpack==1.1.0
+    # via langgraph-checkpoint
+networkx==3.4.2
+    # via torch
+numpy==2.2.4
+    # via
+    #   accelerate
+    #   gradio
+    #   pandas
+    #   transformers
+orjson==3.10.15
+    # via
+    #   gradio
+    #   langgraph-sdk
+    #   langsmith
+packaging==24.2
+    # via
+    #   accelerate
+    #   gradio
+    #   gradio-client
+    #   huggingface-hub
+    #   langchain-core
+    #   langsmith
+    #   transformers
+pandas==2.2.3
+    # via gradio
+pillow==11.1.0
+    # via
+    #   fun-image-caption (pyproject.toml)
+    #   gradio
+psutil==7.0.0
+    # via accelerate
+pydantic==2.10.6
+    # via
+    #   fastapi
+    #   gradio
+    #   langchain-core
+    #   langsmith
+pydantic-core==2.27.2
+    # via pydantic
+pydub==0.25.1
+    # via gradio
+pygments==2.19.1
+    # via rich
+python-dateutil==2.9.0.post0
+    # via pandas
+python-multipart==0.0.20
+    # via gradio
+pytz==2025.1
+    # via pandas
+pyyaml==6.0.2
+    # via
+    #   accelerate
+    #   gradio
+    #   huggingface-hub
+    #   langchain-core
+    #   transformers
+regex==2024.11.6
+    # via transformers
+requests==2.32.3
+    # via
+    #   huggingface-hub
+    #   langsmith
+    #   requests-toolbelt
+    #   transformers
+requests-toolbelt==1.0.0
+    # via langsmith
+rich==13.9.4
+    # via typer
+ruff==0.11.2
+    # via gradio
+safehttpx==0.1.6
+    # via gradio
+safetensors==0.5.3
+    # via
+    #   accelerate
+    #   transformers
+semantic-version==2.10.0
+    # via gradio
+shellingham==1.5.4
+    # via typer
+six==1.17.0
+    # via python-dateutil
+sniffio==1.3.1
+    # via anyio
+starlette==0.46.1
+    # via
+    #   fastapi
+    #   gradio
+sympy==1.13.3
+    # via torch
+tenacity==9.0.0
+    # via langchain-core
+tokenizers==0.21.1
+    # via transformers
+tomlkit==0.13.2
+    # via gradio
+torch==2.4.0
+    # via
+    #   fun-image-caption (pyproject.toml)
+    #   accelerate
+tqdm==4.67.1
+    # via
+    #   huggingface-hub
+    #   transformers
+transformers==4.49.0
+    # via fun-image-caption (pyproject.toml)
+typer==0.15.2
+    # via gradio
+typing-extensions==4.12.2
+    # via
+    #   anyio
+    #   fastapi
+    #   gradio
+    #   gradio-client
+    #   huggingface-hub
+    #   langchain-core
+    #   pydantic
+    #   pydantic-core
+    #   rich
+    #   torch
+    #   typer
+    #   uvicorn
+tzdata==2025.2
+    # via pandas
+urllib3==2.3.0
+    # via requests
+uvicorn==0.34.0
+    # via gradio
+websockets==15.0.1
+    # via gradio-client
+zstandard==0.23.0
+    # via langsmith