AI_Powered_Web_Scraper / SPACE_CONFIG.md
MagicMeWizard's picture
Create SPACE_CONFIG.md
29eba28 verified
# πŸ”§ HuggingFace Spaces Configuration Guide
**Essential configuration options for your AI Dataset Studio Space**
---
## πŸ“‹ **Required README.md Header**
Every HuggingFace Space **must** have this YAML frontmatter at the very beginning of README.md:
### **Basic Configuration (Recommended)**
```yaml
---
title: AI Dataset Studio
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
```
### **Alternative Configurations**
#### **Professional/Business Version**
```yaml
---
title: Enterprise Dataset Studio
emoji: 🏒
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: true
license: mit
tags:
- machine-learning
- datasets
- nlp
- data-science
- perplexity-ai
---
```
#### **Research/Academic Version**
```yaml
---
title: Research Dataset Creator
emoji: πŸŽ“
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: apache-2.0
tags:
- research
- academic
- datasets
- nlp
- ai
---
```
#### **Creative/Colorful Version**
```yaml
---
title: AI Dataset Magic ✨
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
tags:
- datasets
- creative
- ai-tools
- machine-learning
---
```
---
## 🎨 **Configuration Options Explained**
### **Required Fields**
| Field | Description | Example Values |
|-------|-------------|----------------|
| `title` | Space name displayed in UI | `AI Dataset Studio` |
| `emoji` | Icon shown next to title | `πŸš€`, `πŸ€–`, `πŸ“Š`, `🎯` |
| `colorFrom` | Gradient start color | `blue`, `red`, `green`, `purple` |
| `colorTo` | Gradient end color | `purple`, `pink`, `yellow`, `blue` |
| `sdk` | Framework used | `gradio` (for our app) |
| `sdk_version` | SDK version | `"4.44.0"` |
| `app_file` | Main application file | `app.py` |
### **Optional Fields**
| Field | Description | Example Values |
|-------|-------------|----------------|
| `pinned` | Pin to your profile | `true`, `false` |
| `license` | Software license | `mit`, `apache-2.0`, `gpl-3.0` |
| `tags` | Searchable keywords | `machine-learning`, `nlp`, `datasets` |
| `models` | Referenced models | `facebook/bart-large-cnn` |
| `datasets` | Referenced datasets | `imdb`, `sentiment140` |
---
## 🎯 **Popular Color Combinations**
### **Professional Themes**
```yaml
# Corporate Blue
colorFrom: blue
colorTo: indigo
# Business Gray
colorFrom: gray
colorTo: blue
# Tech Green
colorFrom: green
colorTo: teal
```
### **Creative Themes**
```yaml
# Sunset
colorFrom: orange
colorTo: red
# Ocean
colorFrom: blue
colorTo: cyan
# Forest
colorFrom: green
colorTo: yellow
# Galaxy
colorFrom: purple
colorTo: pink
```
### **AI/Tech Themes**
```yaml
# Matrix
colorFrom: green
colorTo: black
# Cyberpunk
colorFrom: purple
colorTo: blue
# Neural
colorFrom: blue
colorTo: purple
```
---
## 🏷️ **Recommended Tags**
### **For AI Dataset Studio**
```yaml
tags:
- machine-learning
- datasets
- nlp
- data-science
- perplexity-ai
- web-scraping
- sentiment-analysis
- text-classification
- ai-tools
- data-collection
```
### **By Use Case**
#### **Business/Enterprise**
```yaml
tags:
- business-intelligence
- enterprise
- data-analytics
- market-research
- customer-insights
```
#### **Research/Academic**
```yaml
tags:
- research
- academic
- scientific
- literature-review
- research-tools
```
#### **Developer Tools**
```yaml
tags:
- developer-tools
- api
- automation
- productivity
- data-engineering
```
---
## πŸ“Š **Hardware Configuration**
The Space configuration also affects hardware selection:
### **Hardware Options**
```yaml
# In Space settings (not README.md):
# - CPU Basic (free)
# - CPU Upgrade ($0.03/hour)
# - T4 Small ($0.60/hour) ← Recommended
# - T4 Medium ($1.20/hour)
# - A10G Small ($1.05/hour)
# - A10G Large ($3.15/hour)
```
### **Memory Requirements**
```yaml
# Our application needs:
# - Base app: ~200MB
# - AI models: ~2-4GB
# - Processing: ~1-2GB
# Total: ~4-6GB recommended (T4 Small = 16GB)
```
---
## πŸ” **Environment Variables**
Set these in Space Settings β†’ Repository secrets:
### **Required**
```bash
PERPLEXITY_API_KEY = "your_perplexity_api_key_here"
```
### **Optional**
```bash
# HuggingFace integration
HF_TOKEN = "your_huggingface_token"
# Performance tuning
MAX_SOURCES_PER_SEARCH = "50"
REQUEST_TIMEOUT = "30"
LOG_LEVEL = "INFO"
# Feature flags
ENABLE_DEBUG_MODE = "false"
ENABLE_CACHING = "true"
```
---
## βœ… **Validation Checklist**
Before deploying, ensure:
- [ ] βœ… YAML frontmatter is at the very beginning of README.md
- [ ] βœ… No spaces before the opening `---`
- [ ] βœ… Proper YAML syntax (quotes around version numbers)
- [ ] βœ… `app_file: app.py` matches your main file name
- [ ] βœ… SDK version matches your requirements.txt
- [ ] βœ… Title and emoji are appropriate for your audience
- [ ] βœ… Tags are relevant and searchable
- [ ] βœ… PERPLEXITY_API_KEY is set in Space secrets
---
## 🚨 **Common Configuration Errors**
### **❌ Missing Frontmatter**
```markdown
# πŸš€ AI Dataset Studio ← ERROR: No YAML header
```
### **βœ… Correct Format**
```markdown
---
title: AI Dataset Studio
emoji: πŸš€
sdk: gradio
---
# πŸš€ AI Dataset Studio ← Correct: Content after YAML
```
### **❌ Wrong SDK Version Format**
```yaml
sdk_version: 4.44.0 ← ERROR: Missing quotes
```
### **βœ… Correct Format**
```yaml
sdk_version: "4.44.0" ← Correct: Quoted string
```
### **❌ Invalid App File**
```yaml
app_file: main.py ← ERROR: File doesn't exist
```
### **βœ… Correct Format**
```yaml
app_file: app.py ← Correct: Matches actual filename
```
---
## πŸ”„ **Updating Configuration**
To change your Space configuration:
1. **Edit README.md**
- Update the YAML frontmatter
- Commit changes to git
2. **Space will automatically rebuild**
- Changes take effect immediately
- Monitor build logs for errors
3. **Hardware changes**
- Go to Space Settings
- Change hardware tier
- Restart Space
---
## πŸŽ‰ **Example Complete README.md Start**
Here's how your README.md should begin:
```markdown
---
title: AI Dataset Studio
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: mit
tags:
- machine-learning
- datasets
- nlp
- perplexity-ai
- data-science
---
# πŸš€ AI Dataset Studio
**Create high-quality training datasets with AI-powered source discovery**
A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI...
```
---
## πŸ’‘ **Pro Tips**
1. **Choose memorable titles** - They appear in search results
2. **Use relevant emojis** - They make your Space stand out
3. **Pick good color combinations** - They create visual appeal
4. **Add comprehensive tags** - They improve discoverability
5. **Pin important Spaces** - They appear prominently on your profile
6. **Use appropriate licenses** - MIT or Apache-2.0 for open source
---
**Your Space configuration is now properly set up for deployment! πŸš€**