AI_Powered_Web_Scraper / TROUBLESHOOTING.md
MagicMeWizard's picture
Create TROUBLESHOOTING.md
1d9e7b0 verified
# πŸ”§ AI Dataset Studio - Complete Troubleshooting Guide
## 🚨 **Immediate Fix for Current Error**
### **Error: "DatasetStudio is not defined"**
```
NameError: name 'DatasetStudio' is not defined
```
βœ… **SOLUTION:** Replace your current `app.py` with the **complete fixed version** I provided above.
**Quick Fix Steps:**
1. **Replace app.py** - Use the complete version from the artifacts above
2. **Add missing files** - Download all the files I've provided
3. **Restart your Space** - The error will be resolved
---
## πŸ“ **Files You Need (Complete Checklist)**
| File | Status | Purpose |
|------|--------|---------|
| βœ… `app.py` | **Replace yours** | Main application (complete version) |
| ❌ `app_minimal.py` | **Missing** | Fallback version (basic deps only) |
| βœ… `requirements.txt` | **Have it** | Dependencies |
| βœ… `README.md` | **Have it** | Documentation |
| βœ… `config.py` | **Have it** | Configuration |
| ❌ `utils.py` | **Incomplete** | Utility functions |
| ❌ `startup.py` | **Missing** | Smart launcher |
| ❌ `TROUBLESHOOTING.md` | **Missing** | This guide |
---
## πŸš€ **Quick Deployment Options**
### **Option 1: Immediate Fix (Recommended)**
```bash
# Use the complete app.py I provided above
# This fixes the DatasetStudio error immediately
```
### **Option 2: Minimal Version (Guaranteed to Work)**
```bash
# Use app_minimal.py as your main app.py
# This version works with basic dependencies only
```
### **Option 3: Smart Startup (Auto-Detect)**
```bash
# Use startup.py as your main app.py
# Automatically chooses the best version to run
```
---
## πŸ” **Common Issues & Solutions**
### **Issue 1: Missing Dependencies**
```
ModuleNotFoundError: No module named 'transformers'
ModuleNotFoundError: No module named 'bs4'
```
βœ… **SOLUTIONS:**
#### **A. Minimal Installation (Fastest)**
```bash
pip install gradio pandas requests beautifulsoup4
# Use app_minimal.py
```
#### **B. Full Installation**
```bash
pip install gradio pandas requests beautifulsoup4 transformers torch nltk datasets
# Use app.py (full version)
```
#### **C. Update requirements.txt**
```txt
gradio>=4.44.0
pandas>=2.0.0
requests>=2.31.0
beautifulsoup4>=4.12.0
```
---
### **Issue 2: Slow Loading**
```
Application taking too long to start
Models downloading...
```
βœ… **SOLUTIONS:**
- **Use CPU Basic hardware initially** (loads faster)
- **Try minimal version first** (no AI model downloads)
- **Upgrade to T4 Small** for faster AI model loading
---
### **Issue 3: Memory Issues**
```
CUDA out of memory
Application crashed
```
βœ… **SOLUTIONS:**
- **Start with CPU Basic** (free, lower memory)
- **Use minimal version** (smaller memory footprint)
- **Upgrade gradually** (CPU β†’ T4 β†’ A10G as needed)
---
### **Issue 4: Import Errors**
```
Failed to import DatasetStudio
Module not found errors
```
βœ… **SOLUTIONS:**
- **Replace app.py** with the complete version above
- **Add all missing files** from the artifacts
- **Clear browser cache** and refresh
---
## πŸ₯ **Emergency Fixes**
### **Nuclear Option: Start Completely Fresh**
1. **Create new Space**
2. **Use minimal files only:**
```
- app_minimal.py (rename to app.py)
- requirements.txt (basic only)
- README.md
```
3. **Set hardware to CPU Basic**
4. **Test basic functionality first**
5. **Gradually add features**
### **Quick Test Commands**
```bash
# Test basic imports
python -c "import gradio, pandas, requests; print('βœ… Basic imports work')"
# Test BeautifulSoup
python -c "from bs4 import BeautifulSoup; print('βœ… BeautifulSoup works')"
# Test full app (if using complete version)
python -c "from app import DatasetStudio; print('βœ… DatasetStudio works')"
```
---
## πŸ“Š **Version Comparison**
| Feature | Minimal | Full | Smart |
|---------|---------|------|-------|
| **Dependencies** | 4 packages | 8+ packages | Auto-detect |
| **Startup Time** | 30 seconds | 2-5 minutes | Variable |
| **Web Scraping** | βœ… Basic | βœ… Advanced | βœ… Auto |
| **AI Features** | ❌ None | βœ… All | βœ… If available |
| **Export Formats** | JSON, CSV | All formats | Auto |
| **Memory Usage** | ~100MB | ~2GB | Variable |
| **Reliability** | 🟒 High | 🟑 Medium | 🟒 High |
---
## 🎯 **Deployment Strategy**
### **Step 1: Start Simple**
```yaml
Files: app_minimal.py β†’ app.py, requirements.txt (minimal)
Hardware: CPU Basic
Goal: Verify basic functionality
```
### **Step 2: Add Features**
```yaml
Files: Add complete app.py, config.py, utils.py
Hardware: CPU Upgrade
Goal: Test advanced features
```
### **Step 3: Full Power**
```yaml
Files: All files
Hardware: T4 Small or higher
Goal: Production deployment
```
---
## πŸ”„ **Troubleshooting Workflow**
```
1. 🚨 ERROR OCCURS
↓
2. πŸ” CHECK THIS GUIDE
↓
3. πŸ› οΈ APPLY QUICK FIX
↓
4. πŸ§ͺ TEST SOLUTION
↓
5. βœ… SUCCESS OR ⬆️ ESCALATE
```
### **Escalation Path:**
1. **Try minimal version** β†’ `app_minimal.py`
2. **Check dependencies** β†’ Install missing packages
3. **Review logs** β†’ Look for specific errors
4. **Contact support** β†’ Provide error details
---
## πŸ’‘ **Pro Tips**
### **Development Best Practices**
- βœ… **Start minimal, add complexity gradually**
- βœ… **Test locally before deploying**
- βœ… **Use version control for file management**
- βœ… **Monitor Space logs for errors**
### **Performance Optimization**
- βœ… **CPU Basic for development/testing**
- βœ… **T4 Small for production**
- βœ… **Enable persistent storage for large datasets**
- βœ… **Use minimal version when possible**
### **Reliability Tips**
- βœ… **Always have a fallback version ready**
- βœ… **Test with sample URLs before large batches**
- βœ… **Monitor Space analytics for usage patterns**
- βœ… **Keep dependencies up to date**
---
## πŸ†˜ **Getting Help**
### **Information to Include When Asking for Help:**
```
1. Exact error message
2. Files you're using (app.py vs app_minimal.py)
3. Hardware type (CPU Basic, T4 Small, etc.)
4. Dependencies installed
5. Space logs (if available)
```
### **Quick Health Check Script:**
```python
import sys
print(f"Python: {sys.version}")
try:
import gradio
print(f"βœ… Gradio: {gradio.__version__}")
except ImportError:
print("❌ Gradio not available")
try:
from bs4 import BeautifulSoup
print("βœ… BeautifulSoup available")
except ImportError:
print("❌ BeautifulSoup not available")
try:
from app import DatasetStudio
print("βœ… DatasetStudio available")
except ImportError as e:
print(f"❌ DatasetStudio error: {e}")
```
---
## πŸŽ‰ **Success Indicators**
You'll know everything is working when you see:
```
πŸš€ Starting AI Dataset Studio...
πŸ“Š Features: βœ… AI Models | βœ… Advanced NLP | βœ… HuggingFace Integration
βœ… DatasetStudio initialized successfully
βœ… Interface created successfully
Running on local URL: http://0.0.0.0:7860
```
**If you see this, you're ready to create amazing datasets!** 🎯
---
## πŸ“ž **Support Channels**
- πŸ“– **Documentation**: README.md in your Space
- πŸ’¬ **Community**: HuggingFace Discussions
- πŸ› **Bug Reports**: Include logs and error details
- πŸ“§ **Direct Help**: Describe your setup and error
**Remember: Every issue has a solution - start with the minimal version and build up!** πŸ’ͺ