AI_Powered_Web_Scraper / SPACE_CONFIG.md
MagicMeWizard's picture
Create SPACE_CONFIG.md
29eba28 verified

πŸ”§ HuggingFace Spaces Configuration Guide

Essential configuration options for your AI Dataset Studio Space


πŸ“‹ Required README.md Header

Every HuggingFace Space must have this YAML frontmatter at the very beginning of README.md:

Basic Configuration (Recommended)

---
title: AI Dataset Studio
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---

Alternative Configurations

Professional/Business Version

---
title: Enterprise Dataset Studio
emoji: 🏒
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: true
license: mit
tags:
  - machine-learning
  - datasets
  - nlp
  - data-science
  - perplexity-ai
---

Research/Academic Version

---
title: Research Dataset Creator
emoji: πŸŽ“
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - research
  - academic
  - datasets
  - nlp
  - ai
---

Creative/Colorful Version

---
title: AI Dataset Magic ✨
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
tags:
  - datasets
  - creative
  - ai-tools
  - machine-learning
---

🎨 Configuration Options Explained

Required Fields

Field Description Example Values
title Space name displayed in UI AI Dataset Studio
emoji Icon shown next to title πŸš€, πŸ€–, πŸ“Š, 🎯
colorFrom Gradient start color blue, red, green, purple
colorTo Gradient end color purple, pink, yellow, blue
sdk Framework used gradio (for our app)
sdk_version SDK version "4.44.0"
app_file Main application file app.py

Optional Fields

Field Description Example Values
pinned Pin to your profile true, false
license Software license mit, apache-2.0, gpl-3.0
tags Searchable keywords machine-learning, nlp, datasets
models Referenced models facebook/bart-large-cnn
datasets Referenced datasets imdb, sentiment140

🎯 Popular Color Combinations

Professional Themes

# Corporate Blue
colorFrom: blue
colorTo: indigo

# Business Gray
colorFrom: gray
colorTo: blue

# Tech Green
colorFrom: green
colorTo: teal

Creative Themes

# Sunset
colorFrom: orange
colorTo: red

# Ocean
colorFrom: blue
colorTo: cyan

# Forest
colorFrom: green
colorTo: yellow

# Galaxy
colorFrom: purple
colorTo: pink

AI/Tech Themes

# Matrix
colorFrom: green
colorTo: black

# Cyberpunk
colorFrom: purple
colorTo: blue

# Neural
colorFrom: blue
colorTo: purple

🏷️ Recommended Tags

For AI Dataset Studio

tags:
  - machine-learning
  - datasets
  - nlp
  - data-science
  - perplexity-ai
  - web-scraping
  - sentiment-analysis
  - text-classification
  - ai-tools
  - data-collection

By Use Case

Business/Enterprise

tags:
  - business-intelligence
  - enterprise
  - data-analytics
  - market-research
  - customer-insights

Research/Academic

tags:
  - research
  - academic
  - scientific
  - literature-review
  - research-tools

Developer Tools

tags:
  - developer-tools
  - api
  - automation
  - productivity
  - data-engineering

πŸ“Š Hardware Configuration

The Space configuration also affects hardware selection:

Hardware Options

# In Space settings (not README.md):
# - CPU Basic (free)
# - CPU Upgrade ($0.03/hour)
# - T4 Small ($0.60/hour) ← Recommended
# - T4 Medium ($1.20/hour)
# - A10G Small ($1.05/hour)
# - A10G Large ($3.15/hour)

Memory Requirements

# Our application needs:
# - Base app: ~200MB
# - AI models: ~2-4GB
# - Processing: ~1-2GB
# Total: ~4-6GB recommended (T4 Small = 16GB)

πŸ” Environment Variables

Set these in Space Settings β†’ Repository secrets:

Required

PERPLEXITY_API_KEY = "your_perplexity_api_key_here"

Optional

# HuggingFace integration
HF_TOKEN = "your_huggingface_token"

# Performance tuning
MAX_SOURCES_PER_SEARCH = "50"
REQUEST_TIMEOUT = "30"
LOG_LEVEL = "INFO"

# Feature flags
ENABLE_DEBUG_MODE = "false"
ENABLE_CACHING = "true"

βœ… Validation Checklist

Before deploying, ensure:

  • βœ… YAML frontmatter is at the very beginning of README.md
  • βœ… No spaces before the opening ---
  • βœ… Proper YAML syntax (quotes around version numbers)
  • βœ… app_file: app.py matches your main file name
  • βœ… SDK version matches your requirements.txt
  • βœ… Title and emoji are appropriate for your audience
  • βœ… Tags are relevant and searchable
  • βœ… PERPLEXITY_API_KEY is set in Space secrets

🚨 Common Configuration Errors

❌ Missing Frontmatter

# πŸš€ AI Dataset Studio  ← ERROR: No YAML header

βœ… Correct Format

---
title: AI Dataset Studio
emoji: πŸš€
sdk: gradio
---

# πŸš€ AI Dataset Studio  ← Correct: Content after YAML

❌ Wrong SDK Version Format

sdk_version: 4.44.0  ← ERROR: Missing quotes

βœ… Correct Format

sdk_version: "4.44.0"  ← Correct: Quoted string

❌ Invalid App File

app_file: main.py  ← ERROR: File doesn't exist

βœ… Correct Format

app_file: app.py  ← Correct: Matches actual filename

πŸ”„ Updating Configuration

To change your Space configuration:

  1. Edit README.md

    • Update the YAML frontmatter
    • Commit changes to git
  2. Space will automatically rebuild

    • Changes take effect immediately
    • Monitor build logs for errors
  3. Hardware changes

    • Go to Space Settings
    • Change hardware tier
    • Restart Space

πŸŽ‰ Example Complete README.md Start

Here's how your README.md should begin:

---
title: AI Dataset Studio
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
license: mit
tags:
  - machine-learning
  - datasets
  - nlp
  - perplexity-ai
  - data-science
---

# πŸš€ AI Dataset Studio

**Create high-quality training datasets with AI-powered source discovery**

A comprehensive platform for building ML datasets that combines web scraping, AI processing, and smart source discovery using Perplexity AI...

πŸ’‘ Pro Tips

  1. Choose memorable titles - They appear in search results
  2. Use relevant emojis - They make your Space stand out
  3. Pick good color combinations - They create visual appeal
  4. Add comprehensive tags - They improve discoverability
  5. Pin important Spaces - They appear prominently on your profile
  6. Use appropriate licenses - MIT or Apache-2.0 for open source

Your Space configuration is now properly set up for deployment! πŸš€