|
# π§ HuggingFace Spaces Configuration Guide |
|
|
|
**Essential configuration options for your AI Dataset Studio Space** |
|
|
|
--- |
|
|
|
## π **Required README.md Header** |
|
|
|
Every HuggingFace Space **must** have this YAML frontmatter at the very beginning of README.md: |
|
|
|
### **Basic Configuration (Recommended)** |
|
```yaml |
|
--- |
|
title: AI Dataset Studio |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
``` |
|
|
|
### **Alternative Configurations** |
|
|
|
#### **Professional/Business Version** |
|
```yaml |
|
--- |
|
title: Enterprise Dataset Studio |
|
emoji: π’ |
|
colorFrom: gray |
|
colorTo: blue |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: true |
|
license: mit |
|
tags: |
|
- machine-learning |
|
- datasets |
|
- nlp |
|
- data-science |
|
- perplexity-ai |
|
--- |
|
``` |
|
|
|
#### **Research/Academic Version** |
|
```yaml |
|
--- |
|
title: Research Dataset Creator |
|
emoji: π |
|
colorFrom: green |
|
colorTo: blue |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
tags: |
|
- research |
|
- academic |
|
- datasets |
|
- nlp |
|
- ai |
|
--- |
|
``` |
|
|
|
#### **Creative/Colorful Version** |
|
```yaml |
|
--- |
|
title: AI Dataset Magic β¨ |
|
emoji: π¨ |
|
colorFrom: pink |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: false |
|
tags: |
|
- datasets |
|
- creative |
|
- ai-tools |
|
- machine-learning |
|
--- |
|
``` |
|
|
|
--- |
|
|
|
## π¨ **Configuration Options Explained** |
|
|
|
### **Required Fields** |
|
|
|
| Field | Description | Example Values | |
|
|-------|-------------|----------------| |
|
| `title` | Space name displayed in UI | `AI Dataset Studio` | |
|
| `emoji` | Icon shown next to title | `π`, `π€`, `π`, `π―` | |
|
| `colorFrom` | Gradient start color | `blue`, `red`, `green`, `purple` | |
|
| `colorTo` | Gradient end color | `purple`, `pink`, `yellow`, `blue` | |
|
| `sdk` | Framework used | `gradio` (for our app) | |
|
| `sdk_version` | SDK version | `"4.44.0"` | |
|
| `app_file` | Main application file | `app.py` | |
|
|
|
### **Optional Fields** |
|
|
|
| Field | Description | Example Values | |
|
|-------|-------------|----------------| |
|
| `pinned` | Pin to your profile | `true`, `false` | |
|
| `license` | Software license | `mit`, `apache-2.0`, `gpl-3.0` | |
|
| `tags` | Searchable keywords | `machine-learning`, `nlp`, `datasets` | |
|
| `models` | Referenced models | `facebook/bart-large-cnn` | |
|
| `datasets` | Referenced datasets | `imdb`, `sentiment140` | |
|
|
|
--- |
|
|
|
## π― **Popular Color Combinations** |
|
|
|
### **Professional Themes** |
|
```yaml |
|
# Corporate Blue |
|
colorFrom: blue |
|
colorTo: indigo |
|
|
|
# Business Gray |
|
colorFrom: gray |
|
colorTo: blue |
|
|
|
# Tech Green |
|
colorFrom: green |
|
colorTo: teal |
|
``` |
|
|
|
### **Creative Themes** |
|
```yaml |
|
# Sunset |
|
colorFrom: orange |
|
colorTo: red |
|
|
|
# Ocean |
|
colorFrom: blue |
|
colorTo: cyan |
|
|
|
# Forest |
|
colorFrom: green |
|
colorTo: yellow |
|
|
|
# Galaxy |
|
colorFrom: purple |
|
colorTo: pink |
|
``` |
|
|
|
### **AI/Tech Themes** |
|
```yaml |
|
# Matrix |
|
colorFrom: green |
|
colorTo: black |
|
|
|
# Cyberpunk |
|
colorFrom: purple |
|
colorTo: blue |
|
|
|
# Neural |
|
colorFrom: blue |
|
colorTo: purple |
|
``` |
|
|
|
--- |
|
|
|
## π·οΈ **Recommended Tags** |
|
|
|
### **For AI Dataset Studio** |
|
```yaml |
|
tags: |
|
- machine-learning |
|
- datasets |
|
- nlp |
|
- data-science |
|
- perplexity-ai |
|
- web-scraping |
|
- sentiment-analysis |
|
- text-classification |
|
- ai-tools |
|
- data-collection |
|
``` |
|
|
|
### **By Use Case** |
|
|
|
#### **Business/Enterprise** |
|
```yaml |
|
tags: |
|
- business-intelligence |
|
- enterprise |
|
- data-analytics |
|
- market-research |
|
- customer-insights |
|
``` |
|
|
|
#### **Research/Academic** |
|
```yaml |
|
tags: |
|
- research |
|
- academic |
|
- scientific |
|
- literature-review |
|
- research-tools |
|
``` |
|
|
|
#### **Developer Tools** |
|
```yaml |
|
tags: |
|
- developer-tools |
|
- api |
|
- automation |
|
- productivity |
|
- data-engineering |
|
``` |
|
|
|
--- |
|
|
|
## π **Hardware Configuration** |
|
|
|
The Space configuration also affects hardware selection: |
|
|
|
### **Hardware Options** |
|
```yaml |
|
# In Space settings (not README.md): |
|
# - CPU Basic (free) |
|
# - CPU Upgrade ($0.03/hour) |
|
# - T4 Small ($0.60/hour) β Recommended |
|
# - T4 Medium ($1.20/hour) |
|
# - A10G Small ($1.05/hour) |
|
# - A10G Large ($3.15/hour) |
|
``` |
|
|
|
### **Memory Requirements** |
|
```yaml |
|
# Our application needs: |
|
# - Base app: ~200MB |
|
# - AI models: ~2-4GB |
|
# - Processing: ~1-2GB |
|
# Total: ~4-6GB recommended (T4 Small = 16GB) |
|
``` |
|
|
|
--- |
|
|
|
## π **Environment Variables** |
|
|
|
Set these in Space Settings β Repository secrets: |
|
|
|
### **Required** |
|
```bash |
|
PERPLEXITY_API_KEY = "your_perplexity_api_key_here" |
|
``` |
|
|
|
### **Optional** |
|
```bash |
|
# HuggingFace integration |
|
HF_TOKEN = "your_huggingface_token" |
|
|
|
# Performance tuning |
|
MAX_SOURCES_PER_SEARCH = "50" |
|
REQUEST_TIMEOUT = "30" |
|
LOG_LEVEL = "INFO" |
|
|
|
# Feature flags |
|
ENABLE_DEBUG_MODE = "false" |
|
ENABLE_CACHING = "true" |
|
``` |
|
|
|
--- |
|
|
|
## β
**Validation Checklist** |
|
|
|
Before deploying, ensure: |
|
|
|
- [ ] β
YAML frontmatter is at the very beginning of README.md |
|
- [ ] β
No spaces before the opening `---` |
|
- [ ] β
Proper YAML syntax (quotes around version numbers) |
|
- [ ] β
`app_file: app.py` matches your main file name |
|
- [ ] β
SDK version matches your requirements.txt |
|
- [ ] β
Title and emoji are appropriate for your audience |
|
- [ ] β
Tags are relevant and searchable |
|
- [ ] β
PERPLEXITY_API_KEY is set in Space secrets |
|
|
|
--- |
|
|
|
## π¨ **Common Configuration Errors** |
|
|
|
### **β Missing Frontmatter** |
|
```markdown |
|
# π AI Dataset Studio β ERROR: No YAML header |
|
``` |
|
|
|
### **β
Correct Format** |
|
```markdown |
|
--- |
|
title: AI Dataset Studio |
|
emoji: π |
|
sdk: gradio |
|
--- |
|
|
|
# π AI Dataset Studio β Correct: Content after YAML |
|
``` |
|
|
|
### **β Wrong SDK Version Format** |
|
```yaml |
|
sdk_version: 4.44.0 β ERROR: Missing quotes |
|
``` |
|
|
|
### **β
Correct Format** |
|
```yaml |
|
sdk_version: "4.44.0" β Correct: Quoted string |
|
``` |
|
|
|
### **β Invalid App File** |
|
```yaml |
|
app_file: main.py β ERROR: File doesn't exist |
|
``` |
|
|
|
### **β
Correct Format** |
|
```yaml |
|
app_file: app.py β Correct: Matches actual filename |
|
``` |
|
|
|
--- |
|
|
|
## π **Updating Configuration** |
|
|
|
To change your Space configuration: |
|
|
|
1. **Edit README.md** |
|
- Update the YAML frontmatter |
|
- Commit changes to git |
|
|
|
2. **Space will automatically rebuild** |
|
- Changes take effect immediately |
|
- Monitor build logs for errors |
|
|
|
3. **Hardware changes** |
|
- Go to Space Settings |
|
- Change hardware tier |
|
- Restart Space |
|
|
|
--- |
|
|
|
## π **Example Complete README.md Start** |
|
|
|
Here's how your README.md should begin: |
|
|
|
```markdown |
|
--- |
|
title: AI Dataset Studio |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: "4.44.0" |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
tags: |
|
- machine-learning |
|
- datasets |
|
- nlp |
|
- perplexity-ai |
|
- data-science |
|
--- |
|
|
|
# π AI Dataset Studio |
|
|
|
**Create high-quality training datasets with AI-powered source discovery** |
|
|
|
A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI... |
|
``` |
|
|
|
--- |
|
|
|
## π‘ **Pro Tips** |
|
|
|
1. **Choose memorable titles** - They appear in search results |
|
2. **Use relevant emojis** - They make your Space stand out |
|
3. **Pick good color combinations** - They create visual appeal |
|
4. **Add comprehensive tags** - They improve discoverability |
|
5. **Pin important Spaces** - They appear prominently on your profile |
|
6. **Use appropriate licenses** - MIT or Apache-2.0 for open source |
|
|
|
--- |
|
|
|
**Your Space configuration is now properly set up for deployment! π** |