Dylan commited on
Commit
98efca2
·
1 Parent(s): e82b768

added initial files

Browse files
Files changed (4) hide show
  1. README.md +105 -1
  2. poetry.lock +0 -0
  3. pyproject.toml +23 -0
  4. requirements.txt +241 -0
README.md CHANGED
@@ -10,4 +10,108 @@ pinned: false
10
  short_description: App that gives funny descriptions of images
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  short_description: App that gives funny descriptions of images
11
  ---
12
 
13
+ # Fun Image Caption
14
+
15
+ A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.
16
+
17
+ ## Description
18
+
19
+ This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.
20
+
21
+ ## Features
22
+
23
+ - Upload any image for captioning
24
+ - Choose from multiple voice personas:
25
+ - Scurvy-ridden pirate
26
+ - Forgetful wizard
27
+ - Sarcastic teenager
28
+ - Two-step LangGraph workflow:
29
+ - Image captioning with vision-language model
30
+ - Creative voice-based description
31
+ - Built on efficient 4-bit quantized models for ZeroGPU environments
32
+
33
+ ## Useful Poetry Commands
34
+
35
+ - Show all installed packages: `poetry show`
36
+ - Show detailed info about a specific package: `poetry show <package>`
37
+ - Show package location and details: `poetry show -v <package>`
38
+ - List virtual environments: `poetry env list`
39
+ - Show current environment info: `poetry env info`
40
+ - Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt`
41
+
42
+ ## Requirements
43
+
44
+ - Python 3.10+
45
+ - Poetry (Python package manager)
46
+ - Git
47
+ - CUDA-compatible GPU
48
+
49
+ ## Installation
50
+
51
+ 1. Install Poetry if you haven't already:
52
+ ```bash
53
+ curl -sSL https://install.python-poetry.org | python3 -
54
+ ```
55
+
56
+ 2. Clone the repository:
57
+ ```bash
58
+ git clone https://github.com/yourusername/fun-image-caption.git
59
+ cd fun-image-caption
60
+ ```
61
+
62
+ 3. Create and activate a new Poetry environment:
63
+ ```bash
64
+ poetry env use python3.10
65
+ poetry shell
66
+ ```
67
+
68
+ 4. Install dependencies:
69
+ ```bash
70
+ poetry install
71
+ ```
72
+
73
+ 5. Verify installation:
74
+ ```bash
75
+ poetry show
76
+ ```
77
+
78
+ ## Key Dependencies
79
+
80
+ - accelerate==1.2.1: Framework for efficient model deployment
81
+ - bitsandbytes==0.41.3.post2: Quantization library for model optimization
82
+ - torch==2.4.0: PyTorch for ML operations
83
+ - transformers==4.49.0: Hugging Face transformers library
84
+ - gradio: Web interface framework
85
+ - langgraph: Workflow orchestration for language model pipelines
86
+ - pillow: Python Imaging Library
87
+
88
+ ## Usage
89
+
90
+ 1. Run the application:
91
+ ```bash
92
+ python app.py
93
+ ```
94
+
95
+ 2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)
96
+
97
+ 3. Upload an image using the interface
98
+
99
+ 4. Select a voice persona from the dropdown menu
100
+
101
+ 5. Click "Generate Description" to see the results
102
+
103
+ 6. Enjoy your image description in the selected character voice!
104
+
105
+ ## Models
106
+
107
+ The application uses the following models:
108
+ - Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
109
+ - Voice Description: google/gemma-3-12b (4-bit quantized)
110
+
111
+ ## Author
112
+
113
+ [Your name and contact information]
114
+
115
+ ## License
116
+
117
+ [License information to be added]
poetry.lock ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "fun-image-caption"
3
+ version = "0.1.0"
4
+ description = "This Gradio app processes images and provides descriptions in different voice personas using a LangGraph workflow."
5
+ authors = [
6
+ {name = "Dylan",email = "[email protected]"}
7
+ ]
8
+ readme = "README.md"
9
+ requires-python = "==3.10.13"
10
+ dependencies = [
11
+ "langgraph (>=0.3.18,<0.4.0)",
12
+ "pillow (>=11.1.0,<12.0.0)",
13
+ "gradio (>=5.22.0,<6.0.0)",
14
+ "transformers (==4.49.0)",
15
+ "torch (==2.4.0)",
16
+ "bitsandbytes (~=0.41.3)",
17
+ "accelerate (==1.2.1)",
18
+ ]
19
+
20
+
21
+ [build-system]
22
+ requires = ["poetry-core>=2.0.0,<3.0.0"]
23
+ build-backend = "poetry.core.masonry.api"
requirements.txt ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file was autogenerated by uv via the following command:
2
+ # uv pip compile pyproject.toml -o requirements.txt
3
+ accelerate==1.2.1
4
+ # via fun-image-caption (pyproject.toml)
5
+ aiofiles==23.2.1
6
+ # via gradio
7
+ annotated-types==0.7.0
8
+ # via pydantic
9
+ anyio==4.9.0
10
+ # via
11
+ # gradio
12
+ # httpx
13
+ # starlette
14
+ bitsandbytes==0.41.3.post2
15
+ # via fun-image-caption (pyproject.toml)
16
+ certifi==2025.1.31
17
+ # via
18
+ # httpcore
19
+ # httpx
20
+ # requests
21
+ charset-normalizer==3.4.1
22
+ # via requests
23
+ click==8.1.8
24
+ # via
25
+ # typer
26
+ # uvicorn
27
+ exceptiongroup==1.2.2
28
+ # via anyio
29
+ fastapi==0.115.11
30
+ # via gradio
31
+ ffmpy==0.5.0
32
+ # via gradio
33
+ filelock==3.18.0
34
+ # via
35
+ # huggingface-hub
36
+ # torch
37
+ # transformers
38
+ fsspec==2025.3.0
39
+ # via
40
+ # gradio-client
41
+ # huggingface-hub
42
+ # torch
43
+ gradio==5.22.0
44
+ # via fun-image-caption (pyproject.toml)
45
+ gradio-client==1.8.0
46
+ # via gradio
47
+ groovy==0.1.2
48
+ # via gradio
49
+ h11==0.14.0
50
+ # via
51
+ # httpcore
52
+ # uvicorn
53
+ httpcore==1.0.7
54
+ # via httpx
55
+ httpx==0.28.1
56
+ # via
57
+ # gradio
58
+ # gradio-client
59
+ # langgraph-sdk
60
+ # langsmith
61
+ # safehttpx
62
+ huggingface-hub==0.29.3
63
+ # via
64
+ # accelerate
65
+ # gradio
66
+ # gradio-client
67
+ # tokenizers
68
+ # transformers
69
+ idna==3.10
70
+ # via
71
+ # anyio
72
+ # httpx
73
+ # requests
74
+ jinja2==3.1.6
75
+ # via
76
+ # gradio
77
+ # torch
78
+ jsonpatch==1.33
79
+ # via langchain-core
80
+ jsonpointer==3.0.0
81
+ # via jsonpatch
82
+ langchain-core==0.3.47
83
+ # via
84
+ # langgraph
85
+ # langgraph-checkpoint
86
+ # langgraph-prebuilt
87
+ langgraph==0.3.18
88
+ # via fun-image-caption (pyproject.toml)
89
+ langgraph-checkpoint==2.0.21
90
+ # via
91
+ # langgraph
92
+ # langgraph-prebuilt
93
+ langgraph-prebuilt==0.1.4
94
+ # via langgraph
95
+ langgraph-sdk==0.1.58
96
+ # via langgraph
97
+ langsmith==0.3.18
98
+ # via langchain-core
99
+ markdown-it-py==3.0.0
100
+ # via rich
101
+ markupsafe==3.0.2
102
+ # via
103
+ # gradio
104
+ # jinja2
105
+ mdurl==0.1.2
106
+ # via markdown-it-py
107
+ mpmath==1.3.0
108
+ # via sympy
109
+ msgpack==1.1.0
110
+ # via langgraph-checkpoint
111
+ networkx==3.4.2
112
+ # via torch
113
+ numpy==2.2.4
114
+ # via
115
+ # accelerate
116
+ # gradio
117
+ # pandas
118
+ # transformers
119
+ orjson==3.10.15
120
+ # via
121
+ # gradio
122
+ # langgraph-sdk
123
+ # langsmith
124
+ packaging==24.2
125
+ # via
126
+ # accelerate
127
+ # gradio
128
+ # gradio-client
129
+ # huggingface-hub
130
+ # langchain-core
131
+ # langsmith
132
+ # transformers
133
+ pandas==2.2.3
134
+ # via gradio
135
+ pillow==11.1.0
136
+ # via
137
+ # fun-image-caption (pyproject.toml)
138
+ # gradio
139
+ psutil==7.0.0
140
+ # via accelerate
141
+ pydantic==2.10.6
142
+ # via
143
+ # fastapi
144
+ # gradio
145
+ # langchain-core
146
+ # langsmith
147
+ pydantic-core==2.27.2
148
+ # via pydantic
149
+ pydub==0.25.1
150
+ # via gradio
151
+ pygments==2.19.1
152
+ # via rich
153
+ python-dateutil==2.9.0.post0
154
+ # via pandas
155
+ python-multipart==0.0.20
156
+ # via gradio
157
+ pytz==2025.1
158
+ # via pandas
159
+ pyyaml==6.0.2
160
+ # via
161
+ # accelerate
162
+ # gradio
163
+ # huggingface-hub
164
+ # langchain-core
165
+ # transformers
166
+ regex==2024.11.6
167
+ # via transformers
168
+ requests==2.32.3
169
+ # via
170
+ # huggingface-hub
171
+ # langsmith
172
+ # requests-toolbelt
173
+ # transformers
174
+ requests-toolbelt==1.0.0
175
+ # via langsmith
176
+ rich==13.9.4
177
+ # via typer
178
+ ruff==0.11.2
179
+ # via gradio
180
+ safehttpx==0.1.6
181
+ # via gradio
182
+ safetensors==0.5.3
183
+ # via
184
+ # accelerate
185
+ # transformers
186
+ semantic-version==2.10.0
187
+ # via gradio
188
+ shellingham==1.5.4
189
+ # via typer
190
+ six==1.17.0
191
+ # via python-dateutil
192
+ sniffio==1.3.1
193
+ # via anyio
194
+ starlette==0.46.1
195
+ # via
196
+ # fastapi
197
+ # gradio
198
+ sympy==1.13.3
199
+ # via torch
200
+ tenacity==9.0.0
201
+ # via langchain-core
202
+ tokenizers==0.21.1
203
+ # via transformers
204
+ tomlkit==0.13.2
205
+ # via gradio
206
+ torch==2.4.0
207
+ # via
208
+ # fun-image-caption (pyproject.toml)
209
+ # accelerate
210
+ tqdm==4.67.1
211
+ # via
212
+ # huggingface-hub
213
+ # transformers
214
+ transformers==4.49.0
215
+ # via fun-image-caption (pyproject.toml)
216
+ typer==0.15.2
217
+ # via gradio
218
+ typing-extensions==4.12.2
219
+ # via
220
+ # anyio
221
+ # fastapi
222
+ # gradio
223
+ # gradio-client
224
+ # huggingface-hub
225
+ # langchain-core
226
+ # pydantic
227
+ # pydantic-core
228
+ # rich
229
+ # torch
230
+ # typer
231
+ # uvicorn
232
+ tzdata==2025.2
233
+ # via pandas
234
+ urllib3==2.3.0
235
+ # via requests
236
+ uvicorn==0.34.0
237
+ # via gradio
238
+ websockets==15.0.1
239
+ # via gradio-client
240
+ zstandard==0.23.0
241
+ # via langsmith