# 🔧 HuggingFace Spaces Configuration Guide **Essential configuration options for your AI Dataset Studio Space** --- ## 📋 **Required README.md Header** Every HuggingFace Space **must** have this YAML frontmatter at the very beginning of README.md: ### **Basic Configuration (Recommended)** ```yaml --- title: AI Dataset Studio emoji: 🚀 colorFrom: blue colorTo: purple sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false --- ``` ### **Alternative Configurations** #### **Professional/Business Version** ```yaml --- title: Enterprise Dataset Studio emoji: 🏢 colorFrom: gray colorTo: blue sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: true license: mit tags: - machine-learning - datasets - nlp - data-science - perplexity-ai --- ``` #### **Research/Academic Version** ```yaml --- title: Research Dataset Creator emoji: 🎓 colorFrom: green colorTo: blue sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false license: apache-2.0 tags: - research - academic - datasets - nlp - ai --- ``` #### **Creative/Colorful Version** ```yaml --- title: AI Dataset Magic ✨ emoji: 🎨 colorFrom: pink colorTo: purple sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false tags: - datasets - creative - ai-tools - machine-learning --- ``` --- ## 🎨 **Configuration Options Explained** ### **Required Fields** | Field | Description | Example Values | |-------|-------------|----------------| | `title` | Space name displayed in UI | `AI Dataset Studio` | | `emoji` | Icon shown next to title | `🚀`, `🤖`, `📊`, `🎯` | | `colorFrom` | Gradient start color | `blue`, `red`, `green`, `purple` | | `colorTo` | Gradient end color | `purple`, `pink`, `yellow`, `blue` | | `sdk` | Framework used | `gradio` (for our app) | | `sdk_version` | SDK version | `"4.44.0"` | | `app_file` | Main application file | `app.py` | ### **Optional Fields** | Field | Description | Example Values | |-------|-------------|----------------| | `pinned` | Pin to your profile | `true`, `false` | | `license` | Software license | `mit`, `apache-2.0`, `gpl-3.0` | | `tags` | Searchable keywords | `machine-learning`, `nlp`, `datasets` | | `models` | Referenced models | `facebook/bart-large-cnn` | | `datasets` | Referenced datasets | `imdb`, `sentiment140` | --- ## 🎯 **Popular Color Combinations** ### **Professional Themes** ```yaml # Corporate Blue colorFrom: blue colorTo: indigo # Business Gray colorFrom: gray colorTo: blue # Tech Green colorFrom: green colorTo: teal ``` ### **Creative Themes** ```yaml # Sunset colorFrom: orange colorTo: red # Ocean colorFrom: blue colorTo: cyan # Forest colorFrom: green colorTo: yellow # Galaxy colorFrom: purple colorTo: pink ``` ### **AI/Tech Themes** ```yaml # Matrix colorFrom: green colorTo: black # Cyberpunk colorFrom: purple colorTo: blue # Neural colorFrom: blue colorTo: purple ``` --- ## 🏷️ **Recommended Tags** ### **For AI Dataset Studio** ```yaml tags: - machine-learning - datasets - nlp - data-science - perplexity-ai - web-scraping - sentiment-analysis - text-classification - ai-tools - data-collection ``` ### **By Use Case** #### **Business/Enterprise** ```yaml tags: - business-intelligence - enterprise - data-analytics - market-research - customer-insights ``` #### **Research/Academic** ```yaml tags: - research - academic - scientific - literature-review - research-tools ``` #### **Developer Tools** ```yaml tags: - developer-tools - api - automation - productivity - data-engineering ``` --- ## 📊 **Hardware Configuration** The Space configuration also affects hardware selection: ### **Hardware Options** ```yaml # In Space settings (not README.md): # - CPU Basic (free) # - CPU Upgrade ($0.03/hour) # - T4 Small ($0.60/hour) ← Recommended # - T4 Medium ($1.20/hour) # - A10G Small ($1.05/hour) # - A10G Large ($3.15/hour) ``` ### **Memory Requirements** ```yaml # Our application needs: # - Base app: ~200MB # - AI models: ~2-4GB # - Processing: ~1-2GB # Total: ~4-6GB recommended (T4 Small = 16GB) ``` --- ## 🔐 **Environment Variables** Set these in Space Settings → Repository secrets: ### **Required** ```bash PERPLEXITY_API_KEY = "your_perplexity_api_key_here" ``` ### **Optional** ```bash # HuggingFace integration HF_TOKEN = "your_huggingface_token" # Performance tuning MAX_SOURCES_PER_SEARCH = "50" REQUEST_TIMEOUT = "30" LOG_LEVEL = "INFO" # Feature flags ENABLE_DEBUG_MODE = "false" ENABLE_CACHING = "true" ``` --- ## ✅ **Validation Checklist** Before deploying, ensure: - [ ] ✅ YAML frontmatter is at the very beginning of README.md - [ ] ✅ No spaces before the opening `---` - [ ] ✅ Proper YAML syntax (quotes around version numbers) - [ ] ✅ `app_file: app.py` matches your main file name - [ ] ✅ SDK version matches your requirements.txt - [ ] ✅ Title and emoji are appropriate for your audience - [ ] ✅ Tags are relevant and searchable - [ ] ✅ PERPLEXITY_API_KEY is set in Space secrets --- ## 🚨 **Common Configuration Errors** ### **❌ Missing Frontmatter** ```markdown # 🚀 AI Dataset Studio ← ERROR: No YAML header ``` ### **✅ Correct Format** ```markdown --- title: AI Dataset Studio emoji: 🚀 sdk: gradio --- # 🚀 AI Dataset Studio ← Correct: Content after YAML ``` ### **❌ Wrong SDK Version Format** ```yaml sdk_version: 4.44.0 ← ERROR: Missing quotes ``` ### **✅ Correct Format** ```yaml sdk_version: "4.44.0" ← Correct: Quoted string ``` ### **❌ Invalid App File** ```yaml app_file: main.py ← ERROR: File doesn't exist ``` ### **✅ Correct Format** ```yaml app_file: app.py ← Correct: Matches actual filename ``` --- ## 🔄 **Updating Configuration** To change your Space configuration: 1. **Edit README.md** - Update the YAML frontmatter - Commit changes to git 2. **Space will automatically rebuild** - Changes take effect immediately - Monitor build logs for errors 3. **Hardware changes** - Go to Space Settings - Change hardware tier - Restart Space --- ## 🎉 **Example Complete README.md Start** Here's how your README.md should begin: ```markdown --- title: AI Dataset Studio emoji: 🚀 colorFrom: blue colorTo: purple sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false license: mit tags: - machine-learning - datasets - nlp - perplexity-ai - data-science --- # 🚀 AI Dataset Studio **Create high-quality training datasets with AI-powered source discovery** A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI... ``` --- ## 💡 **Pro Tips** 1. **Choose memorable titles** - They appear in search results 2. **Use relevant emojis** - They make your Space stand out 3. **Pick good color combinations** - They create visual appeal 4. **Add comprehensive tags** - They improve discoverability 5. **Pin important Spaces** - They appear prominently on your profile 6. **Use appropriate licenses** - MIT or Apache-2.0 for open source --- **Your Space configuration is now properly set up for deployment! 🚀**