Create SPACE_CONFIG.md
Browse files- SPACE_CONFIG.md +389 -0
SPACE_CONFIG.md
ADDED
@@ -0,0 +1,389 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π§ HuggingFace Spaces Configuration Guide
|
2 |
+
|
3 |
+
**Essential configuration options for your AI Dataset Studio Space**
|
4 |
+
|
5 |
+
---
|
6 |
+
|
7 |
+
## π **Required README.md Header**
|
8 |
+
|
9 |
+
Every HuggingFace Space **must** have this YAML frontmatter at the very beginning of README.md:
|
10 |
+
|
11 |
+
### **Basic Configuration (Recommended)**
|
12 |
+
```yaml
|
13 |
+
---
|
14 |
+
title: AI Dataset Studio
|
15 |
+
emoji: π
|
16 |
+
colorFrom: blue
|
17 |
+
colorTo: purple
|
18 |
+
sdk: gradio
|
19 |
+
sdk_version: "4.44.0"
|
20 |
+
app_file: app.py
|
21 |
+
pinned: false
|
22 |
+
---
|
23 |
+
```
|
24 |
+
|
25 |
+
### **Alternative Configurations**
|
26 |
+
|
27 |
+
#### **Professional/Business Version**
|
28 |
+
```yaml
|
29 |
+
---
|
30 |
+
title: Enterprise Dataset Studio
|
31 |
+
emoji: π’
|
32 |
+
colorFrom: gray
|
33 |
+
colorTo: blue
|
34 |
+
sdk: gradio
|
35 |
+
sdk_version: "4.44.0"
|
36 |
+
app_file: app.py
|
37 |
+
pinned: true
|
38 |
+
license: mit
|
39 |
+
tags:
|
40 |
+
- machine-learning
|
41 |
+
- datasets
|
42 |
+
- nlp
|
43 |
+
- data-science
|
44 |
+
- perplexity-ai
|
45 |
+
---
|
46 |
+
```
|
47 |
+
|
48 |
+
#### **Research/Academic Version**
|
49 |
+
```yaml
|
50 |
+
---
|
51 |
+
title: Research Dataset Creator
|
52 |
+
emoji: π
|
53 |
+
colorFrom: green
|
54 |
+
colorTo: blue
|
55 |
+
sdk: gradio
|
56 |
+
sdk_version: "4.44.0"
|
57 |
+
app_file: app.py
|
58 |
+
pinned: false
|
59 |
+
license: apache-2.0
|
60 |
+
tags:
|
61 |
+
- research
|
62 |
+
- academic
|
63 |
+
- datasets
|
64 |
+
- nlp
|
65 |
+
- ai
|
66 |
+
---
|
67 |
+
```
|
68 |
+
|
69 |
+
#### **Creative/Colorful Version**
|
70 |
+
```yaml
|
71 |
+
---
|
72 |
+
title: AI Dataset Magic β¨
|
73 |
+
emoji: π¨
|
74 |
+
colorFrom: pink
|
75 |
+
colorTo: purple
|
76 |
+
sdk: gradio
|
77 |
+
sdk_version: "4.44.0"
|
78 |
+
app_file: app.py
|
79 |
+
pinned: false
|
80 |
+
tags:
|
81 |
+
- datasets
|
82 |
+
- creative
|
83 |
+
- ai-tools
|
84 |
+
- machine-learning
|
85 |
+
---
|
86 |
+
```
|
87 |
+
|
88 |
+
---
|
89 |
+
|
90 |
+
## π¨ **Configuration Options Explained**
|
91 |
+
|
92 |
+
### **Required Fields**
|
93 |
+
|
94 |
+
| Field | Description | Example Values |
|
95 |
+
|-------|-------------|----------------|
|
96 |
+
| `title` | Space name displayed in UI | `AI Dataset Studio` |
|
97 |
+
| `emoji` | Icon shown next to title | `π`, `π€`, `π`, `π―` |
|
98 |
+
| `colorFrom` | Gradient start color | `blue`, `red`, `green`, `purple` |
|
99 |
+
| `colorTo` | Gradient end color | `purple`, `pink`, `yellow`, `blue` |
|
100 |
+
| `sdk` | Framework used | `gradio` (for our app) |
|
101 |
+
| `sdk_version` | SDK version | `"4.44.0"` |
|
102 |
+
| `app_file` | Main application file | `app.py` |
|
103 |
+
|
104 |
+
### **Optional Fields**
|
105 |
+
|
106 |
+
| Field | Description | Example Values |
|
107 |
+
|-------|-------------|----------------|
|
108 |
+
| `pinned` | Pin to your profile | `true`, `false` |
|
109 |
+
| `license` | Software license | `mit`, `apache-2.0`, `gpl-3.0` |
|
110 |
+
| `tags` | Searchable keywords | `machine-learning`, `nlp`, `datasets` |
|
111 |
+
| `models` | Referenced models | `facebook/bart-large-cnn` |
|
112 |
+
| `datasets` | Referenced datasets | `imdb`, `sentiment140` |
|
113 |
+
|
114 |
+
---
|
115 |
+
|
116 |
+
## π― **Popular Color Combinations**
|
117 |
+
|
118 |
+
### **Professional Themes**
|
119 |
+
```yaml
|
120 |
+
# Corporate Blue
|
121 |
+
colorFrom: blue
|
122 |
+
colorTo: indigo
|
123 |
+
|
124 |
+
# Business Gray
|
125 |
+
colorFrom: gray
|
126 |
+
colorTo: blue
|
127 |
+
|
128 |
+
# Tech Green
|
129 |
+
colorFrom: green
|
130 |
+
colorTo: teal
|
131 |
+
```
|
132 |
+
|
133 |
+
### **Creative Themes**
|
134 |
+
```yaml
|
135 |
+
# Sunset
|
136 |
+
colorFrom: orange
|
137 |
+
colorTo: red
|
138 |
+
|
139 |
+
# Ocean
|
140 |
+
colorFrom: blue
|
141 |
+
colorTo: cyan
|
142 |
+
|
143 |
+
# Forest
|
144 |
+
colorFrom: green
|
145 |
+
colorTo: yellow
|
146 |
+
|
147 |
+
# Galaxy
|
148 |
+
colorFrom: purple
|
149 |
+
colorTo: pink
|
150 |
+
```
|
151 |
+
|
152 |
+
### **AI/Tech Themes**
|
153 |
+
```yaml
|
154 |
+
# Matrix
|
155 |
+
colorFrom: green
|
156 |
+
colorTo: black
|
157 |
+
|
158 |
+
# Cyberpunk
|
159 |
+
colorFrom: purple
|
160 |
+
colorTo: blue
|
161 |
+
|
162 |
+
# Neural
|
163 |
+
colorFrom: blue
|
164 |
+
colorTo: purple
|
165 |
+
```
|
166 |
+
|
167 |
+
---
|
168 |
+
|
169 |
+
## π·οΈ **Recommended Tags**
|
170 |
+
|
171 |
+
### **For AI Dataset Studio**
|
172 |
+
```yaml
|
173 |
+
tags:
|
174 |
+
- machine-learning
|
175 |
+
- datasets
|
176 |
+
- nlp
|
177 |
+
- data-science
|
178 |
+
- perplexity-ai
|
179 |
+
- web-scraping
|
180 |
+
- sentiment-analysis
|
181 |
+
- text-classification
|
182 |
+
- ai-tools
|
183 |
+
- data-collection
|
184 |
+
```
|
185 |
+
|
186 |
+
### **By Use Case**
|
187 |
+
|
188 |
+
#### **Business/Enterprise**
|
189 |
+
```yaml
|
190 |
+
tags:
|
191 |
+
- business-intelligence
|
192 |
+
- enterprise
|
193 |
+
- data-analytics
|
194 |
+
- market-research
|
195 |
+
- customer-insights
|
196 |
+
```
|
197 |
+
|
198 |
+
#### **Research/Academic**
|
199 |
+
```yaml
|
200 |
+
tags:
|
201 |
+
- research
|
202 |
+
- academic
|
203 |
+
- scientific
|
204 |
+
- literature-review
|
205 |
+
- research-tools
|
206 |
+
```
|
207 |
+
|
208 |
+
#### **Developer Tools**
|
209 |
+
```yaml
|
210 |
+
tags:
|
211 |
+
- developer-tools
|
212 |
+
- api
|
213 |
+
- automation
|
214 |
+
- productivity
|
215 |
+
- data-engineering
|
216 |
+
```
|
217 |
+
|
218 |
+
---
|
219 |
+
|
220 |
+
## π **Hardware Configuration**
|
221 |
+
|
222 |
+
The Space configuration also affects hardware selection:
|
223 |
+
|
224 |
+
### **Hardware Options**
|
225 |
+
```yaml
|
226 |
+
# In Space settings (not README.md):
|
227 |
+
# - CPU Basic (free)
|
228 |
+
# - CPU Upgrade ($0.03/hour)
|
229 |
+
# - T4 Small ($0.60/hour) β Recommended
|
230 |
+
# - T4 Medium ($1.20/hour)
|
231 |
+
# - A10G Small ($1.05/hour)
|
232 |
+
# - A10G Large ($3.15/hour)
|
233 |
+
```
|
234 |
+
|
235 |
+
### **Memory Requirements**
|
236 |
+
```yaml
|
237 |
+
# Our application needs:
|
238 |
+
# - Base app: ~200MB
|
239 |
+
# - AI models: ~2-4GB
|
240 |
+
# - Processing: ~1-2GB
|
241 |
+
# Total: ~4-6GB recommended (T4 Small = 16GB)
|
242 |
+
```
|
243 |
+
|
244 |
+
---
|
245 |
+
|
246 |
+
## π **Environment Variables**
|
247 |
+
|
248 |
+
Set these in Space Settings β Repository secrets:
|
249 |
+
|
250 |
+
### **Required**
|
251 |
+
```bash
|
252 |
+
PERPLEXITY_API_KEY = "your_perplexity_api_key_here"
|
253 |
+
```
|
254 |
+
|
255 |
+
### **Optional**
|
256 |
+
```bash
|
257 |
+
# HuggingFace integration
|
258 |
+
HF_TOKEN = "your_huggingface_token"
|
259 |
+
|
260 |
+
# Performance tuning
|
261 |
+
MAX_SOURCES_PER_SEARCH = "50"
|
262 |
+
REQUEST_TIMEOUT = "30"
|
263 |
+
LOG_LEVEL = "INFO"
|
264 |
+
|
265 |
+
# Feature flags
|
266 |
+
ENABLE_DEBUG_MODE = "false"
|
267 |
+
ENABLE_CACHING = "true"
|
268 |
+
```
|
269 |
+
|
270 |
+
---
|
271 |
+
|
272 |
+
## β
**Validation Checklist**
|
273 |
+
|
274 |
+
Before deploying, ensure:
|
275 |
+
|
276 |
+
- [ ] β
YAML frontmatter is at the very beginning of README.md
|
277 |
+
- [ ] β
No spaces before the opening `---`
|
278 |
+
- [ ] β
Proper YAML syntax (quotes around version numbers)
|
279 |
+
- [ ] β
`app_file: app.py` matches your main file name
|
280 |
+
- [ ] β
SDK version matches your requirements.txt
|
281 |
+
- [ ] β
Title and emoji are appropriate for your audience
|
282 |
+
- [ ] β
Tags are relevant and searchable
|
283 |
+
- [ ] β
PERPLEXITY_API_KEY is set in Space secrets
|
284 |
+
|
285 |
+
---
|
286 |
+
|
287 |
+
## π¨ **Common Configuration Errors**
|
288 |
+
|
289 |
+
### **β Missing Frontmatter**
|
290 |
+
```markdown
|
291 |
+
# π AI Dataset Studio β ERROR: No YAML header
|
292 |
+
```
|
293 |
+
|
294 |
+
### **β
Correct Format**
|
295 |
+
```markdown
|
296 |
+
---
|
297 |
+
title: AI Dataset Studio
|
298 |
+
emoji: π
|
299 |
+
sdk: gradio
|
300 |
+
---
|
301 |
+
|
302 |
+
# π AI Dataset Studio β Correct: Content after YAML
|
303 |
+
```
|
304 |
+
|
305 |
+
### **β Wrong SDK Version Format**
|
306 |
+
```yaml
|
307 |
+
sdk_version: 4.44.0 β ERROR: Missing quotes
|
308 |
+
```
|
309 |
+
|
310 |
+
### **β
Correct Format**
|
311 |
+
```yaml
|
312 |
+
sdk_version: "4.44.0" β Correct: Quoted string
|
313 |
+
```
|
314 |
+
|
315 |
+
### **β Invalid App File**
|
316 |
+
```yaml
|
317 |
+
app_file: main.py β ERROR: File doesn't exist
|
318 |
+
```
|
319 |
+
|
320 |
+
### **β
Correct Format**
|
321 |
+
```yaml
|
322 |
+
app_file: app.py β Correct: Matches actual filename
|
323 |
+
```
|
324 |
+
|
325 |
+
---
|
326 |
+
|
327 |
+
## π **Updating Configuration**
|
328 |
+
|
329 |
+
To change your Space configuration:
|
330 |
+
|
331 |
+
1. **Edit README.md**
|
332 |
+
- Update the YAML frontmatter
|
333 |
+
- Commit changes to git
|
334 |
+
|
335 |
+
2. **Space will automatically rebuild**
|
336 |
+
- Changes take effect immediately
|
337 |
+
- Monitor build logs for errors
|
338 |
+
|
339 |
+
3. **Hardware changes**
|
340 |
+
- Go to Space Settings
|
341 |
+
- Change hardware tier
|
342 |
+
- Restart Space
|
343 |
+
|
344 |
+
---
|
345 |
+
|
346 |
+
## π **Example Complete README.md Start**
|
347 |
+
|
348 |
+
Here's how your README.md should begin:
|
349 |
+
|
350 |
+
```markdown
|
351 |
+
---
|
352 |
+
title: AI Dataset Studio
|
353 |
+
emoji: π
|
354 |
+
colorFrom: blue
|
355 |
+
colorTo: purple
|
356 |
+
sdk: gradio
|
357 |
+
sdk_version: "4.44.0"
|
358 |
+
app_file: app.py
|
359 |
+
pinned: false
|
360 |
+
license: mit
|
361 |
+
tags:
|
362 |
+
- machine-learning
|
363 |
+
- datasets
|
364 |
+
- nlp
|
365 |
+
- perplexity-ai
|
366 |
+
- data-science
|
367 |
+
---
|
368 |
+
|
369 |
+
# π AI Dataset Studio
|
370 |
+
|
371 |
+
**Create high-quality training datasets with AI-powered source discovery**
|
372 |
+
|
373 |
+
A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI...
|
374 |
+
```
|
375 |
+
|
376 |
+
---
|
377 |
+
|
378 |
+
## π‘ **Pro Tips**
|
379 |
+
|
380 |
+
1. **Choose memorable titles** - They appear in search results
|
381 |
+
2. **Use relevant emojis** - They make your Space stand out
|
382 |
+
3. **Pick good color combinations** - They create visual appeal
|
383 |
+
4. **Add comprehensive tags** - They improve discoverability
|
384 |
+
5. **Pin important Spaces** - They appear prominently on your profile
|
385 |
+
6. **Use appropriate licenses** - MIT or Apache-2.0 for open source
|
386 |
+
|
387 |
+
---
|
388 |
+
|
389 |
+
**Your Space configuration is now properly set up for deployment! π**
|