Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

Tonic commited on Jul 20

Commit

93ed7a1

verified ·

1 Parent(s): 769bb84

fix launch script deploy and refactors

Browse files

Files changed (17) hide show

config/train_smollm3_h100_lightweight.py +2 -0
docs/GIT_CONFIGURATION_GUIDE.md +258 -0
launch.sh +62 -45
scripts/dataset_tonic/setup_hf_dataset.py +18 -2
scripts/trackio_tonic/deploy_trackio_space.py +69 -56
scripts/training/train.py +6 -0
src/data.py +31 -1
src/train.py +4 -2
templates/datasets/readme.md +95 -0
templates/spaces/README.md +46 -0
templates/spaces/{requirements_space.txt → requirements.txt} +0 -0
test_pipeline.py +0 -260
tests/test_deployment.py +167 -0
test_formatting_fix.py → tests/test_formatting_fix.py +0 -0
tests/test_pipeline.py +150 -0
tests/test_readme_template.py +123 -0
tests/test_simple_pipeline.py +130 -0

config/train_smollm3_h100_lightweight.py CHANGED Viewed

@@ -56,6 +56,8 @@ config = SmolLM3Config(
     target_field="completion",
     filter_bad_entries=False,
     bad_entry_field="bad_entry",
     # Chat template configuration
     use_chat_template=True,

     target_field="completion",
     filter_bad_entries=False,
     bad_entry_field="bad_entry",
+    sample_size=80000,  # 80K samples for lightweight training
+    sample_seed=42,  # For reproducibility
     # Chat template configuration
     use_chat_template=True,

docs/GIT_CONFIGURATION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,258 @@

+# Git Configuration Guide for Hugging Face Operations
+This guide explains the correct way to configure git for Hugging Face Spaces deployment and model pushing operations.
+## 🎯 **Overview**
+When working with Hugging Face Spaces and model repositories, proper git configuration is essential for:
+- Creating and deploying Spaces
+- Pushing models to the Hub
+- Managing experiment tracking datasets
+- Ensuring proper authentication
+- **Using the user's actual email address for proper git identity and commit attribution**
+## ✅ **Correct Git Configuration**
+### **1. Local vs Global Configuration**
+**❌ Wrong (Current):**
+```bash
+git config --global user.email "[email protected]"
+git config --global user.name "$HF_USERNAME"
+```
+**✅ Correct (Updated):**
+```bash
+# Get user's actual email address
+read -p "Enter your email address for git configuration: " GIT_EMAIL
+# Configure git locally for this project only
+git config user.email "$GIT_EMAIL"
+git config user.name "$HF_USERNAME"
+# Verify configuration
+git config user.email
+git config user.name
+```
+### **2. Proper Authentication Setup**
+**✅ Correct Authentication:**
+```bash
+# Login with token and add to git credentials
+huggingface-cli login --token "$HF_TOKEN" --add-to-git-credential
+# Verify login
+huggingface-cli whoami
+```
+### **3. Error Handling**
+**✅ Robust Configuration:**
+```bash
+# Get user's email and configure git with error handling
+read -p "Enter your email address for git configuration: " GIT_EMAIL
+if git config user.email "$GIT_EMAIL" && \
+   git config user.name "$HF_USERNAME"; then
+    echo "✅ Git configured successfully"
+    echo "  Email: $(git config user.email)"
+    echo "  Name: $(git config user.name)"
+else
+    echo "❌ Failed to configure git"
+    exit 1
+fi
+```
+## 🔧 **Why These Changes Matter**
+### **1. Local Configuration Benefits**
+- **Isolation**: Doesn't affect other projects on the system
+- **Project-specific**: Each project can have different git settings
+- **Cleaner**: No global state pollution
+- **Safer**: Won't interfere with existing git configurations
+### **2. User's Actual Email Address**
+- **Professional**: Uses the user's real email address
+- **Authentic**: Represents the actual user's identity
+- **Consistent**: Matches the user's Hugging Face account
+- **Best Practice**: Follows git configuration standards
+### **3. Token-based Authentication**
+- **Secure**: Uses HF token instead of username/password
+- **Automated**: No manual password entry required
+- **Persistent**: Credentials stored securely
+- **Verified**: Includes verification steps
+## 📋 **Implementation in Launch Script**
+### **Updated Authentication Step:**
+```bash
+# Step 8: Authentication setup
+print_step "Step 8: Authentication Setup"
+echo "================================"
+export HF_TOKEN="$HF_TOKEN"
+export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
+# Login to Hugging Face with token
+print_info "Logging in to Hugging Face..."
+if huggingface-cli login --token "$HF_TOKEN" --add-to-git-credential; then
+    print_status "Successfully logged in to Hugging Face"
+    print_info "Username: $(huggingface-cli whoami)"
+else
+    print_error "Failed to login to Hugging Face"
+    print_error "Please check your token and try again"
+    exit 1
+fi
+# Configure git for HF operations
+print_step "Step 8.1: Git Configuration"
+echo "================================"
+print_info "Configuring git for Hugging Face operations..."
+# Get user's email for git configuration
+get_input "Enter your email address for git configuration" "" GIT_EMAIL
+# Configure git locally (not globally) for this project
+git config user.email "$GIT_EMAIL"
+git config user.name "$HF_USERNAME"
+# Verify git configuration
+print_info "Verifying git configuration..."
+if git config user.email && git config user.name; then
+    print_status "Git configured successfully"
+    print_info "  Email: $(git config user.email)"
+    print_info "  Name: $(git config user.name)"
+else
+    print_error "Failed to configure git"
+    exit 1
+fi
+```
+## 🚀 **Deployment Script Improvements**
+### **Robust File Upload:**
+```python
+def upload_files(self) -> bool:
+    """Upload necessary files to the Space"""
+    try:
+        print("Uploading files to Space...")
+        # Files to upload
+        files_to_upload = [
+            "app.py",
+            "requirements_space.txt",
+            "README.md"
+        ]
+        # Check if we're in a git repository
+        try:
+            subprocess.run(["git", "status"], capture_output=True, check=True)
+        except subprocess.CalledProcessError:
+            print("⚠️  Not in a git repository, initializing...")
+            subprocess.run(["git", "init"], check=True)
+            subprocess.run(["git", "remote", "add", "origin", f"https://huggingface.co/spaces/{self.username}/{self.space_name}"], check=True)
+        # Add all files at once
+        existing_files = [f for f in files_to_upload if os.path.exists(f)]
+        if existing_files:
+            subprocess.run(["git", "add"] + existing_files, check=True)
+            subprocess.run(["git", "commit", "-m", "Initial Space setup"], check=True)
+            # Push to the space
+            try:
+                subprocess.run(["git", "push", "origin", "main"], check=True)
+                print(f"✅ Uploaded {len(existing_files)} files")
+            except subprocess.CalledProcessError:
+                # Try pushing to master branch if main doesn't exist
+                subprocess.run(["git", "push", "origin", "master"], check=True)
+                print(f"✅ Uploaded {len(existing_files)} files")
+        else:
+            print("⚠️  No files found to upload")
+        return True
+    except Exception as e:
+        print(f"❌ Error uploading files: {e}")
+        return False
+```
+## 🔍 **Troubleshooting**
+### **Common Issues and Solutions:**
+#### **1. Git Configuration Fails**
+```bash
+# Check current git config
+git config --list
+# Reset if needed
+git config --unset user.email
+git config --unset user.name
+# Reconfigure
+git config user.email "[email protected]"
+git config user.name "your-username"
+```
+#### **2. Authentication Issues**
+```bash
+# Check HF login status
+huggingface-cli whoami
+# Re-login if needed
+huggingface-cli logout
+huggingface-cli login --token "your-token"
+```
+#### **3. Space Deployment Fails**
+```bash
+# Check git remote
+git remote -v
+# Re-add remote if needed
+git remote remove origin
+git remote add origin https://huggingface.co/spaces/username/space-name
+```
+## 📚 **Best Practices**
+### **1. Always Use Local Configuration**
+- Use `git config` without `--global` flag
+- Keeps project configurations isolated
+- Prevents conflicts with other projects
+### **2. Verify Configuration**
+- Always check that git config was successful
+- Display configured values for verification
+- Exit on failure to prevent downstream issues
+### **3. Use Token-based Authentication**
+- More secure than username/password
+- Automatically handles credential storage
+- Works well with CI/CD systems
+### **4. Handle Errors Gracefully**
+- Check return codes from git commands
+- Provide clear error messages
+- Exit early on critical failures
+### **5. Test Configuration**
+- Verify git config after setting it
+- Test HF login before proceeding
+- Validate remote repository access
+## 🎯 **Summary**
+The updated git configuration approach provides:
+1. **✅ Better Isolation**: Local configuration doesn't affect system-wide settings
+2. **✅ User's Actual Email**: Uses the user's real email address for proper git identity
+3. **✅ Proper Authentication**: Token-based login with credential storage
+4. **✅ Error Handling**: Robust verification and error reporting
+5. **✅ Professional Setup**: Uses user's actual email and verification
+6. **✅ Deployment Reliability**: Improved Space deployment with git repository handling
+This ensures a more reliable and professional setup for Hugging Face operations in the SmolLM3 fine-tuning pipeline.

launch.sh CHANGED Viewed

@@ -448,7 +448,41 @@ echo "================================"
 export HF_TOKEN="$HF_TOKEN"
 export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
-huggingface-cli login --token $HF_TOKEN
 # Step 9: Deploy Trackio Space
 print_step "Step 9: Deploying Trackio Space"
@@ -482,14 +516,14 @@ echo "================================="
 cd ../trackio_tonic
 python configure_trackio.py
-# Step 12: Create training configuration
-print_step "Step 12: Creating Training Configuration"
-echo "==========================================="
 cd ../..
-create_training_config "$CONFIG_FILE"
-# Step 13: Dataset preparation (handled by src/data.py during training)
 print_step "Step 13: Dataset Configuration"
 echo "=================================="
@@ -499,57 +533,40 @@ if [ "$TRAINING_CONFIG_TYPE" = "H100 Lightweight (Rapid)" ]; then
     print_info "Sample size: ${DATASET_SAMPLE_SIZE:-80000} (will be handled by data.py)"
 fi
-# Step 14: Calculate training parameters
-print_step "Step 14: Calculating Training Parameters"
-echo "============================================"
-# Estimate training steps
-EFFECTIVE_BATCH_SIZE=$((BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS))
-echo "  Effective batch size: $EFFECTIVE_BATCH_SIZE"
-echo "  Learning rate: $LEARNING_RATE"
-echo "  Max epochs: $MAX_EPOCHS"
-echo "  Sequence length: $MAX_SEQ_LENGTH"
-echo "  Training steps will be calculated by the training script"
 # Step 15: Start training
 print_step "Step 15: Starting Training"
 echo "=============================="
-print_info "Using existing scripts/training/train.py script with the following parameters:"
-echo "  Model: $MODEL_NAME"
-echo "  Dataset: $DATASET_NAME"
-echo "  Output: /output-checkpoint"
-echo "  Batch size: $BATCH_SIZE"
-echo "  Learning rate: $LEARNING_RATE"
-echo "  Sequence length: $MAX_SEQ_LENGTH"
-# Run the existing training script
-python scripts/training/train.py "$CONFIG_FILE" \
-    --dataset_dir "$DATASET_NAME" \
-    --out_dir /output-checkpoint \
-    --init_from scratch \
-    --batch_size $BATCH_SIZE \
-    --learning_rate $LEARNING_RATE \
-    --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
-    --max_seq_length $MAX_SEQ_LENGTH \
-    --save_steps $SAVE_STEPS \
-    --eval_steps $EVAL_STEPS \
-    --logging_steps $LOGGING_STEPS \
-    --enable_tracking \
-    --trackio_url "$TRACKIO_URL" \
-    --experiment_name "$EXPERIMENT_NAME" \
-    --hf_token "$HF_TOKEN" \
-    --dataset_repo "$TRACKIO_DATASET_REPO"
 # Step 16: Push model to Hugging Face Hub
 print_step "Step 16: Pushing Model to HF Hub"
 echo "====================================="
-print_info "Using scripts/model_tonic/push_to_huggingface.py script"
-echo "  Checkpoint: /output-checkpoint"
-echo "  Repository: $REPO_NAME"
-# Run the existing push script
 python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
     --token "$HF_TOKEN" \
     --trackio-url "$TRACKIO_URL" \

 export HF_TOKEN="$HF_TOKEN"
 export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
+# Login to Hugging Face with token
+print_info "Logging in to Hugging Face..."
+if huggingface-cli login --token "$HF_TOKEN" --add-to-git-credential; then
+    print_status "Successfully logged in to Hugging Face"
+    print_info "Username: $(huggingface-cli whoami)"
+else
+    print_error "Failed to login to Hugging Face"
+    print_error "Please check your token and try again"
+    exit 1
+fi
+# Configure git for HF operations
+print_step "Step 8.1: Git Configuration"
+echo "================================"
+print_info "Configuring git for Hugging Face operations..."
+# Get user's email for git configuration
+get_input "Enter the email you used to register your account at huggingface for git configuration" "" GIT_EMAIL
+# Configure git locally (not globally) for this project
+git config user.email "$GIT_EMAIL"
+git config user.name "$HF_USERNAME"
+# Verify git configuration
+print_info "Verifying git configuration..."
+if git config user.email && git config user.name; then
+    print_status "Git configured successfully"
+    print_info "  Email: $(git config user.email)"
+    print_info "  Name: $(git config user.name)"
+else
+    print_error "Failed to configure git"
+    exit 1
+fi
 # Step 9: Deploy Trackio Space
 print_step "Step 9: Deploying Trackio Space"
 cd ../trackio_tonic
 python configure_trackio.py
+# Step 12: Training Configuration
+print_step "Step 12: Training Configuration"
+echo "==================================="
 cd ../..
+print_info "Using existing configuration file: $CONFIG_FILE"
+# Step 13: Dataset Configuration
 print_step "Step 13: Dataset Configuration"
 echo "=================================="
     print_info "Sample size: ${DATASET_SAMPLE_SIZE:-80000} (will be handled by data.py)"
 fi
+# Step 14: Training Parameters
+print_step "Step 14: Training Parameters"
+echo "================================"
+print_info "Training parameters will be loaded from configuration file"
+print_info "Model: $MODEL_NAME"
+print_info "Dataset: $DATASET_NAME"
+print_info "Batch size: $BATCH_SIZE"
+print_info "Learning rate: $LEARNING_RATE"
 # Step 15: Start training
 print_step "Step 15: Starting Training"
 echo "=============================="
+print_info "Starting training with configuration: $CONFIG_FILE"
+print_info "Experiment: $EXPERIMENT_NAME"
+print_info "Output: /output-checkpoint"
+print_info "Trackio: $TRACKIO_URL"
+# Run the simpler training script
+python scripts/training/train.py \
+    --config "$CONFIG_FILE" \
+    --experiment-name "$EXPERIMENT_NAME" \
+    --output-dir /output-checkpoint \
+    --trackio-url "$TRACKIO_URL"
 # Step 16: Push model to Hugging Face Hub
 print_step "Step 16: Pushing Model to HF Hub"
 echo "====================================="
+print_info "Pushing model to: $REPO_NAME"
+print_info "Checkpoint: /output-checkpoint"
+# Run the push script
 python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
     --token "$HF_TOKEN" \
     --trackio-url "$TRACKIO_URL" \

scripts/dataset_tonic/setup_hf_dataset.py CHANGED Viewed

@@ -6,6 +6,7 @@ Setup script for Hugging Face Dataset repository for Trackio experiments
 import os
 import json
 from datetime import datetime
 from datasets import Dataset
 from huggingface_hub import HfApi
@@ -249,16 +250,31 @@ def setup_trackio_dataset():
         # Create dataset
         dataset = Dataset.from_list(initial_experiments)
-        # Push to HF Hub
         api = HfApi(token=hf_token)
         dataset.push_to_hub(
             dataset_repo,
             token=hf_token,
-            private=True  # Make it private for security
         )
         print(f"✅ Successfully created dataset: {dataset_repo}")
         print(f"📊 Added {len(initial_experiments)} experiments")
         print("🔒 Dataset is private (only accessible with your token)")
         print("\n🎯 Next steps:")
         print("1. Set HF_TOKEN in your Hugging Face Space environment")

 import os
 import json
 from datetime import datetime
+from pathlib import Path
 from datasets import Dataset
 from huggingface_hub import HfApi
         # Create dataset
         dataset = Dataset.from_list(initial_experiments)
+        # Get the project root directory (2 levels up from this script)
+        project_root = Path(__file__).parent.parent.parent
+        templates_dir = project_root / "templates" / "datasets"
+        readme_path = templates_dir / "readme.md"
+        # Read README content if it exists
+        readme_content = None
+        if readme_path.exists():
+            with open(readme_path, 'r', encoding='utf-8') as f:
+                readme_content = f.read()
+            print(f"✅ Found README template: {readme_path}")
+        # Push to HF Hub with README
         api = HfApi(token=hf_token)
         dataset.push_to_hub(
             dataset_repo,
             token=hf_token,
+            private=True,  # Make it private for security
+            readme_content=readme_content  # Include README if available
         )
         print(f"✅ Successfully created dataset: {dataset_repo}")
         print(f"📊 Added {len(initial_experiments)} experiments")
+        if readme_content:
+            print("📝 Included README from templates")
         print("🔒 Dataset is private (only accessible with your token)")
         print("\n🎯 Next steps:")
         print("1. Set HF_TOKEN in your Hugging Face Space environment")

scripts/trackio_tonic/deploy_trackio_space.py CHANGED Viewed

@@ -61,22 +61,55 @@ class TrackioSpaceDeployer:
         try:
             print("Uploading files to Space...")
-            # Files to upload
             files_to_upload = [
                 "app.py",
-                "requirements_space.txt",
-                "README.md"
             ]
-            for file_path in files_to_upload:
-                if os.path.exists(file_path):
-                    # Use git to add and push files
-                    subprocess.run(["git", "add", file_path], check=True)
-                    subprocess.run(["git", "commit", "-m", f"Add {file_path}"], check=True)
-                    subprocess.run(["git", "push"], check=True)
-                    print(f"✅ Uploaded {file_path}")
                 else:
-                    print(f"⚠️  File not found: {file_path}")
             return True
@@ -89,20 +122,28 @@ class TrackioSpaceDeployer:
         try:
             print("Configuring Space settings...")
-            # Create space configuration
-            space_config = {
-                "title": "Trackio - Experiment Tracking",
-                "emoji": "🚀",
-                "colorFrom": "blue",
-                "colorTo": "purple",
-                "sdk": "gradio",
-                "sdk_version": "4.0.0",
-                "app_file": "app.py",
-                "pinned": False
-            }
-            # Write README.md for the space
-            space_readme = f"""---
 title: Trackio Tonic
 emoji: 🐠
 colorFrom: indigo
@@ -119,39 +160,11 @@ short_description: trackio for training monitoring
 A Gradio interface for experiment tracking and monitoring.
-## Features
-- Create and manage experiments
-- Log training metrics and parameters
-- View experiment details and results
-- Update experiment status
-## Usage
-1. Create a new experiment using the "Create Experiment" tab
-2. Log metrics during training using the "Log Metrics" tab
-3. View experiment details using the "View Experiments" tab
-4. Update experiment status using the "Update Status" tab
-## Integration
-To connect your training script to this Trackio Space:
-```python
-from monitoring import SmolLM3Monitor
-monitor = SmolLM3Monitor(
-    experiment_name="my_experiment",
-    trackio_url="{self.space_url}",
-    enable_tracking=True
-)
-```
 Visit: {self.space_url}
 """
-            with open("README.md", "w") as f:
-                f.write(space_readme)
             return True

         try:
             print("Uploading files to Space...")
+            # Get the project root directory (3 levels up from this script)
+            project_root = Path(__file__).parent.parent.parent
+            templates_dir = project_root / "templates" / "spaces"
+            # Files to upload from templates/spaces
             files_to_upload = [
                 "app.py",
+                "requirements.txt"
             ]
+            # README.md will be created by configure_space method
+            # Copy files from templates/spaces to current directory
+            copied_files = []
+            for file_name in files_to_upload:
+                source_path = templates_dir / file_name
+                if source_path.exists():
+                    import shutil
+                    shutil.copy2(source_path, file_name)
+                    copied_files.append(file_name)
+                    print(f"✅ Copied {file_name} from templates")
                 else:
+                    print(f"⚠️  File not found: {source_path}")
+            # Check if we're in a git repository
+            try:
+                subprocess.run(["git", "status"], capture_output=True, check=True)
+            except subprocess.CalledProcessError:
+                print("⚠️  Not in a git repository, initializing...")
+                subprocess.run(["git", "init"], check=True)
+                subprocess.run(["git", "remote", "add", "origin", f"https://huggingface.co/spaces/{self.username}/{self.space_name}"], check=True)
+            # Add all files at once
+            existing_files = [f for f in files_to_upload if os.path.exists(f)]
+            if existing_files:
+                subprocess.run(["git", "add"] + existing_files, check=True)
+                subprocess.run(["git", "add", "README.md"], check=True)  # Add README.md that was created in configure_space
+                subprocess.run(["git", "commit", "-m", "Initial Space setup"], check=True)
+                # Push to the space
+                try:
+                    subprocess.run(["git", "push", "origin", "main"], check=True)
+                    print(f"✅ Uploaded {len(existing_files)} files")
+                except subprocess.CalledProcessError:
+                    # Try pushing to master branch if main doesn't exist
+                    subprocess.run(["git", "push", "origin", "master"], check=True)
+                    print(f"✅ Uploaded {len(existing_files)} files")
+            else:
+                print("⚠️  No files found to upload")
             return True
         try:
             print("Configuring Space settings...")
+            # Get the project root directory (3 levels up from this script)
+            project_root = Path(__file__).parent.parent.parent
+            templates_dir = project_root / "templates" / "spaces"
+            readme_template_path = templates_dir / "README.md"
+            # Read README template if it exists
+            if readme_template_path.exists():
+                with open(readme_template_path, 'r', encoding='utf-8') as f:
+                    readme_template = f.read()
+                # Replace placeholder with actual space URL
+                readme_content = readme_template.replace("{SPACE_URL}", self.space_url)
+                # Write README.md for the space
+                with open("README.md", "w", encoding='utf-8') as f:
+                    f.write(readme_content)
+                print(f"✅ Created README.md from template")
+            else:
+                print(f"⚠️  README template not found: {readme_template_path}")
+                # Fallback to basic README
+                basic_readme = f"""---
 title: Trackio Tonic
 emoji: 🐠
 colorFrom: indigo
 A Gradio interface for experiment tracking and monitoring.
 Visit: {self.space_url}
 """
+                with open("README.md", "w", encoding='utf-8') as f:
+                    f.write(basic_readme)
+                print(f"✅ Created basic README.md")
             return True

scripts/training/train.py CHANGED Viewed

@@ -63,11 +63,13 @@ def main():
     try:
         from config.train_smollm3_openhermes_fr_a100_large import get_config as get_large_config
         from config.train_smollm3_openhermes_fr_a100_multiple_passes import get_config as get_multiple_passes_config
         # Map config files to their respective functions
         config_map = {
             "config/train_smollm3_openhermes_fr_a100_large.py": get_large_config,
             "config/train_smollm3_openhermes_fr_a100_multiple_passes.py": get_multiple_passes_config,
         }
         if args.config in config_map:
@@ -81,6 +83,7 @@ def main():
         print("Available configurations:")
         print("  - config/train_smollm3_openhermes_fr_a100_large.py (Large batch, 1.3 passes)")
         print("  - config/train_smollm3_openhermes_fr_a100_multiple_passes.py (Multiple passes, 4 epochs)")
         return 1
     # Override experiment name if provided
@@ -124,6 +127,9 @@ def main():
     # Import and run training
     try:
         from train import main as train_main
         # Set up training arguments - config is positional, not --config

     try:
         from config.train_smollm3_openhermes_fr_a100_large import get_config as get_large_config
         from config.train_smollm3_openhermes_fr_a100_multiple_passes import get_config as get_multiple_passes_config
+        from config.train_smollm3_h100_lightweight import config as h100_lightweight_config
         # Map config files to their respective functions
         config_map = {
             "config/train_smollm3_openhermes_fr_a100_large.py": get_large_config,
             "config/train_smollm3_openhermes_fr_a100_multiple_passes.py": get_multiple_passes_config,
+            "config/train_smollm3_h100_lightweight.py": lambda x: h100_lightweight_config,
         }
         if args.config in config_map:
         print("Available configurations:")
         print("  - config/train_smollm3_openhermes_fr_a100_large.py (Large batch, 1.3 passes)")
         print("  - config/train_smollm3_openhermes_fr_a100_multiple_passes.py (Multiple passes, 4 epochs)")
+        print("  - config/train_smollm3_h100_lightweight.py (H100 lightweight, 80K samples)")
         return 1
     # Override experiment name if provided
     # Import and run training
     try:
+        # Add src directory to path
+        src_path = str(Path(__file__).parent.parent.parent / "src")
+        sys.path.insert(0, src_path)
         from train import main as train_main
         # Set up training arguments - config is positional, not --config

src/data.py CHANGED Viewed

@@ -24,7 +24,9 @@ class SmolLM3Dataset:
         use_chat_template: bool = True,
         chat_template_kwargs: Optional[Dict] = None,
         filter_bad_entries: bool = False,
-        bad_entry_field: str = "bad_entry"
     ):
         self.data_path = data_path
         self.tokenizer = tokenizer
@@ -33,6 +35,8 @@ class SmolLM3Dataset:
         self.chat_template_kwargs = chat_template_kwargs or {}
         self.filter_bad_entries = filter_bad_entries
         self.bad_entry_field = bad_entry_field
         # Load and process dataset
         self.dataset = self._load_dataset()
@@ -89,6 +93,32 @@ class SmolLM3Dataset:
                         filtered_size = len(dataset[split])
                         logger.info("Filtered %s: %d -> %d samples", split, original_size, filtered_size)
             # If only 'train' split exists, create validation and test splits
             if ("train" in dataset) and ("validation" not in dataset or "test" not in dataset):
                 logger.info("Automatically splitting train into train/validation/test (98/1/1)")

         use_chat_template: bool = True,
         chat_template_kwargs: Optional[Dict] = None,
         filter_bad_entries: bool = False,
+        bad_entry_field: str = "bad_entry",
+        sample_size: Optional[int] = None,
+        sample_seed: int = 42
     ):
         self.data_path = data_path
         self.tokenizer = tokenizer
         self.chat_template_kwargs = chat_template_kwargs or {}
         self.filter_bad_entries = filter_bad_entries
         self.bad_entry_field = bad_entry_field
+        self.sample_size = sample_size
+        self.sample_seed = sample_seed
         # Load and process dataset
         self.dataset = self._load_dataset()
                         filtered_size = len(dataset[split])
                         logger.info("Filtered %s: %d -> %d samples", split, original_size, filtered_size)
+            # Apply sampling if requested
+            if self.sample_size is not None and "train" in dataset:
+                logger.info(f"Sampling {self.sample_size} random samples from {len(dataset['train'])} total samples")
+                import random
+                random.seed(self.sample_seed)
+                # Sample indices
+                total_samples = len(dataset["train"])
+                if self.sample_size > total_samples:
+                    logger.warning(f"Requested sample size ({self.sample_size}) is larger than dataset size ({total_samples}). Using all samples.")
+                    sampled_indices = list(range(total_samples))
+                else:
+                    sampled_indices = random.sample(range(total_samples), self.sample_size)
+                # Apply sampling to train split
+                dataset["train"] = dataset["train"].select(sampled_indices)
+                logger.info(f"Sampled {len(dataset['train'])} train samples")
+                # Also sample validation if it exists and is large
+                if "validation" in dataset and len(dataset["validation"]) > 1000:
+                    val_sample_size = min(1000, len(dataset["validation"]))
+                    logger.info(f"Sampling {val_sample_size} validation samples from {len(dataset['validation'])} total")
+                    val_sampled_indices = random.sample(range(len(dataset["validation"])), val_sample_size)
+                    dataset["validation"] = dataset["validation"].select(val_sampled_indices)
+                    logger.info(f"Sampled {len(dataset['validation'])} validation samples")
             # If only 'train' split exists, create validation and test splits
             if ("train" in dataset) and ("validation" not in dataset or "test" not in dataset):
                 logger.info("Automatically splitting train into train/validation/test (98/1/1)")

src/train.py CHANGED Viewed

@@ -183,13 +183,15 @@ def main():
         dataset_path = os.path.join('/input', args.dataset_dir)
         logger.info(f"Using local dataset: {dataset_path}")
-    # Load dataset with filtering options
     dataset = SmolLM3Dataset(
         data_path=dataset_path,
         tokenizer=model.tokenizer,
         max_seq_length=args.max_seq_length,
         filter_bad_entries=getattr(config, 'filter_bad_entries', False),
-        bad_entry_field=getattr(config, 'bad_entry_field', 'bad_entry')
     )
     # Initialize trainer

         dataset_path = os.path.join('/input', args.dataset_dir)
         logger.info(f"Using local dataset: {dataset_path}")
+    # Load dataset with filtering options and sampling
     dataset = SmolLM3Dataset(
         data_path=dataset_path,
         tokenizer=model.tokenizer,
         max_seq_length=args.max_seq_length,
         filter_bad_entries=getattr(config, 'filter_bad_entries', False),
+        bad_entry_field=getattr(config, 'bad_entry_field', 'bad_entry'),
+        sample_size=getattr(config, 'sample_size', None),
+        sample_seed=getattr(config, 'sample_seed', 42)
     )
     # Initialize trainer

templates/datasets/readme.md CHANGED Viewed

	@@ -0,0 +1,95 @@

+---
+dataset_info:
+  features:
+  - name: experiment_id
+    dtype: string
+  - name: name
+    dtype: string
+  - name: description
+    dtype: string
+  - name: created_at
+    dtype: string
+  - name: status
+    dtype: string
+  - name: metrics
+    dtype: string
+  - name: parameters
+    dtype: string
+  - name: artifacts
+    dtype: string
+  - name: logs
+    dtype: string
+  - name: last_updated
+    dtype: string
+  splits:
+  - name: train
+    num_bytes: 4945
+    num_examples: 2
+  download_size: 15529
+  dataset_size: 4945
+configs:
+- config_name: default
+  data_files:
+  - split: train
+    path: data/train-*
+tags:
+- trackio
+- tonic
+- experiment tracking
+---
+# Trackio Experiments Dataset
+This dataset stores experiment tracking data for ML training runs, particularly focused on SmolLM3 fine-tuning experiments.
+## Dataset Structure
+The dataset contains the following columns:
+- **experiment_id**: Unique identifier for each experiment
+- **name**: Human-readable name for the experiment
+- **description**: Detailed description of the experiment
+- **created_at**: Timestamp when the experiment was created
+- **status**: Current status (running, completed, failed, paused)
+- **metrics**: JSON string containing training metrics over time
+- **parameters**: JSON string containing experiment configuration
+- **artifacts**: JSON string containing experiment artifacts
+- **logs**: JSON string containing experiment logs
+- **last_updated**: Timestamp of last update
+## Usage
+This dataset is automatically used by the Trackio monitoring system to store and retrieve experiment data. It provides persistent storage for experiment tracking across different training runs.
+## Integration
+The dataset is used by:
+- Trackio Spaces for experiment visualization
+- Training scripts for logging metrics and parameters
+- Monitoring systems for experiment tracking
+## Privacy
+This dataset is private by default to ensure experiment data security. Only users with appropriate permissions can access the data.
+## Examples
+### Sample Experiment Entry
+```json
+{
+  "experiment_id": "exp_20250720_130853",
+  "name": "smollm3_finetune",
+  "description": "SmolLM3 fine-tuning experiment",
+  "created_at": "2025-07-20T11:20:01.780908",
+  "status": "running",
+  "metrics": "[{\"timestamp\": \"2025-07-20T11:20:01.780908\", \"step\": 25, \"metrics\": {\"loss\": 1.1659, \"accuracy\": 0.759}}]",
+  "parameters": "{\"model_name\": \"HuggingFaceTB/SmolLM3-3B\", \"batch_size\": 8, \"learning_rate\": 3.5e-06}",
+  "artifacts": "[]",
+  "logs": "[]",
+  "last_updated": "2025-07-20T11:20:01.780908"
+}
+```
+## License
+This dataset is part of the Trackio experiment tracking system and follows the same license as the main project.

templates/spaces/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+title: Trackio Tonic
+emoji: 🐠
+colorFrom: indigo
+colorTo: yellow
+sdk: gradio
+sdk_version: 5.38.0
+app_file: app.py
+pinned: true
+license: mit
+short_description: trackio for training monitoring
+---
+# Trackio Experiment Tracking
+A Gradio interface for experiment tracking and monitoring.
+## Features
+- Create and manage experiments
+- Log training metrics and parameters
+- View experiment details and results
+- Update experiment status
+## Usage
+1. Create a new experiment using the "Create Experiment" tab
+2. Log metrics during training using the "Log Metrics" tab
+3. View experiment details using the "View Experiments" tab
+4. Update experiment status using the "Update Status" tab
+## Integration
+To connect your training script to this Trackio Space:
+```python
+from monitoring import SmolLM3Monitor
+monitor = SmolLM3Monitor(
+    experiment_name="my_experiment",
+    trackio_url="{SPACE_URL}",
+    enable_tracking=True
+)
+```
+Visit: {SPACE_URL}

templates/spaces/{requirements_space.txt → requirements.txt} RENAMED Viewed

File without changes

test_pipeline.py DELETED Viewed

@@ -1,260 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test script for the SmolLM3 end-to-end pipeline
-Verifies all components are working correctly
-"""
-import os
-import sys
-import subprocess
-import importlib
-from pathlib import Path
-def test_imports():
-    """Test that all required modules can be imported"""
-    print("🔍 Testing imports...")
-    required_modules = [
-        'torch',
-        'transformers',
-        'datasets',
-        'accelerate',
-        'trl',
-        'huggingface_hub',
-        'requests'
-    ]
-    failed_imports = []
-    for module in required_modules:
-        try:
-            importlib.import_module(module)
-            print(f"✅ {module}")
-        except ImportError as e:
-            print(f"❌ {module}: {e}")
-            failed_imports.append(module)
-    if failed_imports:
-        print(f"\n❌ Failed imports: {failed_imports}")
-        return False
-    print("✅ All imports successful")
-    return True
-def test_local_modules():
-    """Test local module imports"""
-    print("\n🔍 Testing local modules...")
-    # Add src to path
-    sys.path.append('src')
-    local_modules = [
-        'config',
-        'model',
-        'data',
-        'trainer',
-        'monitoring'
-    ]
-    failed_imports = []
-    for module in local_modules:
-        try:
-            importlib.import_module(module)
-            print(f"✅ {module}")
-        except ImportError as e:
-            print(f"❌ {module}: {e}")
-            failed_imports.append(module)
-    if failed_imports:
-        print(f"\n❌ Failed local imports: {failed_imports}")
-        return False
-    print("✅ All local modules imported successfully")
-    return True
-def test_scripts():
-    """Test script availability"""
-    print("\n🔍 Testing scripts...")
-    required_scripts = [
-        'scripts/trackio_tonic/deploy_trackio_space.py',
-        'scripts/trackio_tonic/configure_trackio.py',
-        'scripts/dataset_tonic/setup_hf_dataset.py',
-        'scripts/model_tonic/push_to_huggingface.py',
-        'src/train.py'
-    ]
-    missing_scripts = []
-    for script in required_scripts:
-        if Path(script).exists():
-            print(f"✅ {script}")
-        else:
-            print(f"❌ {script}")
-            missing_scripts.append(script)
-    if missing_scripts:
-        print(f"\n❌ Missing scripts: {missing_scripts}")
-        return False
-    print("✅ All scripts found")
-    return True
-def test_configs():
-    """Test configuration files"""
-    print("\n🔍 Testing configurations...")
-    config_dir = Path('config')
-    if not config_dir.exists():
-        print("❌ config directory not found")
-        return False
-    config_files = list(config_dir.glob('*.py'))
-    if not config_files:
-        print("❌ No configuration files found")
-        return False
-    print(f"✅ Found {len(config_files)} configuration files:")
-    for config in config_files:
-        print(f"  - {config.name}")
-    return True
-def test_requirements():
-    """Test requirements files"""
-    print("\n🔍 Testing requirements...")
-    requirements_dir = Path('requirements')
-    if not requirements_dir.exists():
-        print("❌ requirements directory not found")
-        return False
-    req_files = list(requirements_dir.glob('*.txt'))
-    if not req_files:
-        print("❌ No requirements files found")
-        return False
-    print(f"✅ Found {len(req_files)} requirements files:")
-    for req in req_files:
-        print(f"  - {req.name}")
-    return True
-def test_cuda():
-    """Test CUDA availability"""
-    print("\n🔍 Testing CUDA...")
-    try:
-        import torch
-        if torch.cuda.is_available():
-            device_count = torch.cuda.device_count()
-            device_name = torch.cuda.get_device_name(0)
-            print(f"✅ CUDA available: {device_count} device(s)")
-            print(f"  - Device 0: {device_name}")
-        else:
-            print("⚠️  CUDA not available (training will be slower)")
-    except Exception as e:
-        print(f"❌ CUDA test failed: {e}")
-        return False
-    return True
-def test_hf_token():
-    """Test Hugging Face token"""
-    print("\n🔍 Testing HF token...")
-    token = os.environ.get('HF_TOKEN')
-    if not token:
-        print("⚠️  HF_TOKEN not set (will be prompted during setup)")
-        return True
-    try:
-        result = subprocess.run(
-            ['huggingface-cli', 'whoami'],
-            capture_output=True,
-            text=True,
-            timeout=10
-        )
-        if result.returncode == 0:
-            username = result.stdout.strip()
-            print(f"✅ HF token valid: {username}")
-            return True
-        else:
-            print(f"❌ HF token invalid: {result.stderr}")
-            return False
-    except Exception as e:
-        print(f"❌ HF token test failed: {e}")
-        return False
-def test_pipeline_components():
-    """Test individual pipeline components"""
-    print("\n🔍 Testing pipeline components...")
-    # Test setup script
-    if Path('setup_launch.py').exists():
-        print("✅ setup_launch.py found")
-    else:
-        print("❌ setup_launch.py not found")
-        return False
-    # Test launch script
-    if Path('launch.sh').exists():
-        print("✅ launch.sh found")
-    else:
-        print("❌ launch.sh not found")
-        return False
-    # Test README
-    if Path('README_END_TO_END.md').exists():
-        print("✅ README_END_TO_END.md found")
-    else:
-        print("❌ README_END_TO_END.md not found")
-        return False
-    return True
-def main():
-    """Run all tests"""
-    print("🧪 SmolLM3 End-to-End Pipeline Test")
-    print("=" * 50)
-    tests = [
-        test_imports,
-        test_local_modules,
-        test_scripts,
-        test_configs,
-        test_requirements,
-        test_cuda,
-        test_hf_token,
-        test_pipeline_components
-    ]
-    passed = 0
-    total = len(tests)
-    for test in tests:
-        try:
-            if test():
-                passed += 1
-        except Exception as e:
-            print(f"❌ Test failed with exception: {e}")
-    print(f"\n📊 Test Results: {passed}/{total} passed")
-    if passed == total:
-        print("🎉 All tests passed! Pipeline is ready to use.")
-        print("\n🚀 Next steps:")
-        print("1. Run: python setup_launch.py")
-        print("2. Run: chmod +x launch.sh")
-        print("3. Run: ./launch.sh")
-    else:
-        print("❌ Some tests failed. Please fix the issues before running the pipeline.")
-        print("\n🔧 Common fixes:")
-        print("1. Install missing packages: pip install -r requirements/requirements_core.txt")
-        print("2. Set HF_TOKEN environment variable")
-        print("3. Check CUDA installation")
-    return passed == total
-if __name__ == "__main__":
-    success = main()
-    sys.exit(0 if success else 1)

tests/test_deployment.py ADDED Viewed

	@@ -0,0 +1,167 @@

+#!/usr/bin/env python3
+"""
+Test script to verify deployment scripts work correctly
+"""
+import os
+import sys
+from pathlib import Path
+# Add project root to path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+def test_templates_exist():
+    """Test that all required template files exist"""
+    print("🔍 Testing template files...")
+    # Check spaces templates
+    spaces_dir = project_root / "templates" / "spaces"
+    spaces_files = ["app.py", "requirements.txt", "README.md"]
+    for file_name in spaces_files:
+        file_path = spaces_dir / file_name
+        if file_path.exists():
+            print(f"✅ {file_path}")
+        else:
+            print(f"❌ {file_path} not found")
+            return False
+    # Check datasets templates
+    datasets_dir = project_root / "templates" / "datasets"
+    datasets_files = ["readme.md"]
+    for file_name in datasets_files:
+        file_path = datasets_dir / file_name
+        if file_path.exists():
+            print(f"✅ {file_path}")
+        else:
+            print(f"❌ {file_path} not found")
+            return False
+    return True
+def test_deployment_scripts():
+    """Test that deployment scripts can import required modules"""
+    print("\n🔍 Testing deployment scripts...")
+    try:
+        # Test space deployment script
+        from scripts.trackio_tonic.deploy_trackio_space import TrackioSpaceDeployer
+        print("✅ deploy_trackio_space.py imports successfully")
+        # Test dataset setup script
+        from scripts.dataset_tonic.setup_hf_dataset import setup_trackio_dataset
+        print("✅ setup_hf_dataset.py imports successfully")
+        return True
+    except Exception as e:
+        print(f"❌ Deployment script test failed: {e}")
+        return False
+def test_file_copying():
+    """Test that file copying logic works"""
+    print("\n🔍 Testing file copying logic...")
+    try:
+        # Test space deployment file copying
+        from scripts.trackio_tonic.deploy_trackio_space import TrackioSpaceDeployer
+        # Create a mock deployer
+        deployer = TrackioSpaceDeployer("test-space", "test-user", "test-token")
+        # Test that templates directory exists
+        project_root = Path(__file__).parent
+        templates_dir = project_root / "templates" / "spaces"
+        if templates_dir.exists():
+            print(f"✅ Templates directory exists: {templates_dir}")
+            # Check that required files exist
+            for file_name in ["app.py", "requirements.txt", "README.md"]:
+                file_path = templates_dir / file_name
+                if file_path.exists():
+                    print(f"✅ Template file exists: {file_path}")
+                else:
+                    print(f"❌ Template file missing: {file_path}")
+                    return False
+        else:
+            print(f"❌ Templates directory missing: {templates_dir}")
+            return False
+        return True
+    except Exception as e:
+        print(f"❌ File copying test failed: {e}")
+        return False
+def test_readme_inclusion():
+    """Test that README inclusion logic works"""
+    print("\n🔍 Testing README inclusion...")
+    try:
+        # Test dataset README inclusion
+        from scripts.dataset_tonic.setup_hf_dataset import setup_trackio_dataset
+        # Check that README template exists
+        project_root = Path(__file__).parent
+        readme_path = project_root / "templates" / "datasets" / "readme.md"
+        if readme_path.exists():
+            print(f"✅ README template exists: {readme_path}")
+            # Check README content
+            with open(readme_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+                if len(content.strip()) > 0:
+                    print(f"✅ README has content ({len(content)} characters)")
+                else:
+                    print(f"⚠️  README is empty")
+        else:
+            print(f"❌ README template missing: {readme_path}")
+            return False
+        return True
+    except Exception as e:
+        print(f"❌ README inclusion test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("🚀 Testing Deployment Scripts")
+    print("=" * 50)
+    tests = [
+        test_templates_exist,
+        test_deployment_scripts,
+        test_file_copying,
+        test_readme_inclusion
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        if test():
+            passed += 1
+        else:
+            print(f"❌ Test failed: {test.__name__}")
+    print(f"\n{'='*50}")
+    print(f"📊 Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("🎉 All tests passed! Deployment scripts are ready to use.")
+        print("\n🚀 Deployment workflow:")
+        print("1. Space deployment will copy files from templates/spaces/")
+        print("2. Dataset creation will include README from templates/datasets/")
+        print("3. Both scripts will properly upload all required files")
+        return 0
+    else:
+        print("❌ Some tests failed. Please fix the issues before deployment.")
+        return 1
+if __name__ == "__main__":
+    exit(main())

test_formatting_fix.py → tests/test_formatting_fix.py RENAMED Viewed

File without changes

tests/test_pipeline.py ADDED Viewed

	@@ -0,0 +1,150 @@

+#!/usr/bin/env python3
+"""
+Quick test script to verify pipeline components
+"""
+import os
+import sys
+from pathlib import Path
+# Add project root to path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+def test_imports():
+    """Test that all required modules can be imported"""
+    print("🔍 Testing imports...")
+    try:
+        from src.config import get_config
+        print("✅ src.config imported successfully")
+    except ImportError as e:
+        print(f"❌ Failed to import src.config: {e}")
+        return False
+    try:
+        from src.model import SmolLM3Model
+        print("✅ src.model imported successfully")
+    except ImportError as e:
+        print(f"❌ Failed to import src.model: {e}")
+        return False
+    try:
+        from src.data import SmolLM3Dataset
+        print("✅ src.data imported successfully")
+    except ImportError as e:
+        print(f"❌ Failed to import src.data: {e}")
+        return False
+    try:
+        from src.trainer import SmolLM3Trainer
+        print("✅ src.trainer imported successfully")
+    except ImportError as e:
+        print(f"❌ Failed to import src.trainer: {e}")
+        return False
+    try:
+        from src.monitoring import create_monitor_from_config
+        print("✅ src.monitoring imported successfully")
+    except ImportError as e:
+        print(f"❌ Failed to import src.monitoring: {e}")
+        return False
+    return True
+def test_config_loading():
+    """Test that configuration files can be loaded"""
+    print("\n🔍 Testing config loading...")
+    config_files = [
+        "config/train_smollm3_h100_lightweight.py",
+        "config/train_smollm3_openhermes_fr_a100_large.py",
+        "config/train_smollm3.py"
+    ]
+    for config_file in config_files:
+        if os.path.exists(config_file):
+            try:
+                config = get_config(config_file)
+                print(f"✅ {config_file} loaded successfully")
+                print(f"   Model: {config.model_name}")
+                print(f"   Batch size: {config.batch_size}")
+                if hasattr(config, 'sample_size') and config.sample_size:
+                    print(f"   Sample size: {config.sample_size}")
+            except Exception as e:
+                print(f"❌ Failed to load {config_file}: {e}")
+                return False
+        else:
+            print(f"⚠️  {config_file} not found")
+    return True
+def test_dataset_sampling():
+    """Test dataset sampling functionality"""
+    print("\n🔍 Testing dataset sampling...")
+    try:
+        from datasets import load_dataset
+        from transformers import AutoTokenizer
+        # Load a small test dataset
+        print("Loading test dataset...")
+        dataset = load_dataset("legmlai/openhermes-fr", split="train[:100]")
+        print(f"Loaded {len(dataset)} samples")
+        # Test tokenizer
+        tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
+        print("✅ Tokenizer loaded successfully")
+        # Test dataset with sampling
+        from src.data import SmolLM3Dataset
+        dataset_handler = SmolLM3Dataset(
+            data_path="legmlai/openhermes-fr",
+            tokenizer=tokenizer,
+            max_seq_length=1024,
+            sample_size=50,  # Sample 50 from the 100 we loaded
+            sample_seed=42
+        )
+        train_dataset = dataset_handler.get_train_dataset()
+        print(f"✅ Dataset sampling works: {len(train_dataset)} samples")
+        return True
+    except Exception as e:
+        print(f"❌ Dataset sampling test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("🚀 Testing SmolLM3 Pipeline Components")
+    print("=" * 50)
+    tests = [
+        test_imports,
+        test_config_loading,
+        test_dataset_sampling
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        if test():
+            passed += 1
+        else:
+            print(f"❌ Test failed: {test.__name__}")
+    print(f"\n{'='*50}")
+    print(f"📊 Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("🎉 All tests passed! Pipeline is ready to run.")
+        return 0
+    else:
+        print("❌ Some tests failed. Please fix the issues before running the pipeline.")
+        return 1
+if __name__ == "__main__":
+    exit(main())

tests/test_readme_template.py ADDED Viewed

	@@ -0,0 +1,123 @@

+#!/usr/bin/env python3
+"""
+Test script to verify README template replacement works correctly
+"""
+import os
+import sys
+from pathlib import Path
+# Add project root to path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+def test_readme_template():
+    """Test README template replacement"""
+    print("🔍 Testing README template replacement...")
+    try:
+        # Get template path
+        templates_dir = project_root / "templates" / "spaces"
+        readme_template_path = templates_dir / "README.md"
+        if not readme_template_path.exists():
+            print(f"❌ README template not found: {readme_template_path}")
+            return False
+        # Read template
+        with open(readme_template_path, 'r', encoding='utf-8') as f:
+            template_content = f.read()
+        print(f"✅ README template loaded ({len(template_content)} characters)")
+        # Test placeholder replacement
+        test_space_url = "https://huggingface.co/spaces/test-user/test-space"
+        replaced_content = template_content.replace("{SPACE_URL}", test_space_url)
+        if "{SPACE_URL}" in replaced_content:
+            print("❌ Placeholder replacement failed")
+            return False
+        if test_space_url not in replaced_content:
+            print("❌ Space URL not found in replaced content")
+            return False
+        print("✅ Placeholder replacement works correctly")
+        print(f"✅ Space URL: {test_space_url}")
+        return True
+    except Exception as e:
+        print(f"❌ README template test failed: {e}")
+        return False
+def test_deployment_readme():
+    """Test that deployment script can use README template"""
+    print("\n🔍 Testing deployment script README usage...")
+    try:
+        from scripts.trackio_tonic.deploy_trackio_space import TrackioSpaceDeployer
+        # Create a mock deployer
+        deployer = TrackioSpaceDeployer("test-space", "test-user", "test-token")
+        # Test that templates directory exists
+        project_root = Path(__file__).parent
+        templates_dir = project_root / "templates" / "spaces"
+        readme_template_path = templates_dir / "README.md"
+        if readme_template_path.exists():
+            print(f"✅ README template exists: {readme_template_path}")
+            # Test reading template
+            with open(readme_template_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+                if "{SPACE_URL}" in content:
+                    print("✅ Template contains placeholder")
+                else:
+                    print("⚠️  Template missing placeholder")
+            return True
+        else:
+            print(f"❌ README template missing: {readme_template_path}")
+            return False
+    except Exception as e:
+        print(f"❌ Deployment README test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("🚀 Testing README Template System")
+    print("=" * 50)
+    tests = [
+        test_readme_template,
+        test_deployment_readme
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        if test():
+            passed += 1
+        else:
+            print(f"❌ Test failed: {test.__name__}")
+    print(f"\n{'='*50}")
+    print(f"📊 Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("🎉 All tests passed! README template system is working correctly.")
+        print("\n🚀 Template workflow:")
+        print("1. README template is read from templates/spaces/README.md")
+        print("2. {SPACE_URL} placeholder is replaced with actual space URL")
+        print("3. Customized README is written to the space")
+        return 0
+    else:
+        print("❌ Some tests failed. Please fix the issues before deployment.")
+        return 1
+if __name__ == "__main__":
+    exit(main())

tests/test_simple_pipeline.py ADDED Viewed

	@@ -0,0 +1,130 @@

+#!/usr/bin/env python3
+"""
+Simple test script for the simplified pipeline approach
+"""
+import os
+import sys
+from pathlib import Path
+# Add project root to path
+project_root = Path(__file__).parent
+sys.path.insert(0, str(project_root))
+def test_simple_training_script():
+    """Test the simplified training script"""
+    print("🔍 Testing simplified training script...")
+    try:
+        # Test that the training script can be imported
+        from scripts.training.train import main as train_main
+        print("✅ Training script imported successfully")
+        # Test config loading
+        from config.train_smollm3_h100_lightweight import config as h100_config
+        print("✅ H100 lightweight config loaded successfully")
+        print(f"   Model: {h100_config.model_name}")
+        print(f"   Batch size: {h100_config.batch_size}")
+        print(f"   Sample size: {h100_config.sample_size}")
+        return True
+    except Exception as e:
+        print(f"❌ Training script test failed: {e}")
+        return False
+def test_config_files():
+    """Test that all required config files exist"""
+    print("\n🔍 Testing config files...")
+    config_files = [
+        "config/train_smollm3_h100_lightweight.py",
+        "config/train_smollm3_openhermes_fr_a100_large.py",
+        "config/train_smollm3_openhermes_fr_a100_multiple_passes.py"
+    ]
+    for config_file in config_files:
+        if os.path.exists(config_file):
+            print(f"✅ {config_file}")
+        else:
+            print(f"❌ {config_file} not found")
+            return False
+    return True
+def test_scripts():
+    """Test that all required scripts exist"""
+    print("\n🔍 Testing scripts...")
+    script_files = [
+        "scripts/training/train.py",
+        "scripts/trackio_tonic/deploy_trackio_space.py",
+        "scripts/trackio_tonic/configure_trackio.py",
+        "scripts/dataset_tonic/setup_hf_dataset.py",
+        "scripts/model_tonic/push_to_huggingface.py"
+    ]
+    for script_file in script_files:
+        if os.path.exists(script_file):
+            print(f"✅ {script_file}")
+        else:
+            print(f"❌ {script_file} not found")
+            return False
+    return True
+def test_launch_script():
+    """Test that the launch script exists and is executable"""
+    print("\n🔍 Testing launch script...")
+    launch_script = "launch.sh"
+    if os.path.exists(launch_script):
+        print(f"✅ {launch_script} exists")
+        # Check if it's executable
+        if os.access(launch_script, os.X_OK):
+            print(f"✅ {launch_script} is executable")
+        else:
+            print(f"⚠️  {launch_script} is not executable (run: chmod +x launch.sh)")
+        return True
+    else:
+        print(f"❌ {launch_script} not found")
+        return False
+def main():
+    """Run all tests"""
+    print("🚀 Testing Simplified SmolLM3 Pipeline")
+    print("=" * 50)
+    tests = [
+        test_simple_training_script,
+        test_config_files,
+        test_scripts,
+        test_launch_script
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        if test():
+            passed += 1
+        else:
+            print(f"❌ Test failed: {test.__name__}")
+    print(f"\n{'='*50}")
+    print(f"📊 Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("🎉 All tests passed! Simplified pipeline is ready to run.")
+        print("\n🚀 To run the pipeline:")
+        print("1. chmod +x launch.sh")
+        print("2. ./launch.sh")
+        return 0
+    else:
+        print("❌ Some tests failed. Please fix the issues before running the pipeline.")
+        return 1
+if __name__ == "__main__":
+    exit(main())