Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

Tonic commited on 27 days ago

Commit

39db0ca

verified ·

1 Parent(s): 2df26a0

adds monkey patch for trackio monitoring in torch and readme creator improvements

Browse files

Files changed (10) hide show

docs/MODEL_CARD_USER_INPUT_ANALYSIS.md +233 -0
docs/TRACKIO_TRL_FIX.md +146 -0
launch.sh +14 -1
scripts/model_tonic/push_to_huggingface.py +12 -4
src/__init__.py +29 -0
src/trackio.py +199 -0
src/trainer.py +33 -0
setup_launch.py → tests/setup_launch.py +0 -0
tests/test_trackio_trl_fix.py +153 -0
trackio.py +30 -0

docs/MODEL_CARD_USER_INPUT_ANALYSIS.md ADDED Viewed

	@@ -0,0 +1,233 @@

+# Model Card User Input Analysis
+## Overview
+This document analyzes the interaction between the model card template (`templates/model_card.md`), the model card generator (`scripts/model_tonic/generate_model_card.py`), and the launch script (`launch.sh`) to identify variables that require user input and improve the user experience.
+## Template Variables Analysis
+### Variables in `templates/model_card.md`
+The model card template uses the following variables that can be populated with user input:
+#### Core Model Information
+- `{{model_name}}` - Display name of the model
+- `{{model_description}}` - Brief description of the model
+- `{{repo_name}}` - Hugging Face repository name
+- `{{base_model}}` - Base model used for fine-tuning
+#### Training Configuration
+- `{{training_config_type}}` - Type of training configuration used
+- `{{trainer_type}}` - Type of trainer (SFT, DPO, etc.)
+- `{{batch_size}}` - Training batch size
+- `{{gradient_accumulation_steps}}` - Gradient accumulation steps
+- `{{learning_rate}}` - Learning rate used
+- `{{max_epochs}}` - Maximum number of epochs
+- `{{max_seq_length}}` - Maximum sequence length
+#### Dataset Information
+- `{{dataset_name}}` - Name of the dataset used
+- `{{dataset_size}}` - Size of the dataset
+- `{{dataset_format}}` - Format of the dataset
+- `{{dataset_sample_size}}` - Sample size (for lightweight configs)
+#### Training Results
+- `{{training_loss}}` - Final training loss
+- `{{validation_loss}}` - Final validation loss
+- `{{perplexity}}` - Model perplexity
+#### Infrastructure
+- `{{hardware_info}}` - Hardware used for training
+- `{{experiment_name}}` - Name of the experiment
+- `{{trackio_url}}` - Trackio monitoring URL
+- `{{dataset_repo}}` - HF Dataset repository
+#### Author Information
+- `{{author_name}}` - Author name for citations and attribution
+- `{{model_name_slug}}` - URL-friendly model name
+#### Quantization
+- `{{quantized_models}}` - Boolean indicating if quantized models exist
+## User Input Requirements
+### Previously Missing User Inputs
+#### 1. **Author Name** (`author_name`)
+- **Purpose**: Used in model card metadata and citations
+- **Template Usage**: `{{#if author_name}}author: {{author_name}}{{/if}}`
+- **Citation Usage**: `author={{{author_name}}}`
+- **Default**: "Your Name"
+- **User Input Added**: ✅ **IMPLEMENTED**
+#### 2. **Model Description** (`model_description`)
+- **Purpose**: Brief description of the model's capabilities
+- **Template Usage**: `{{model_description}}`
+- **Default**: "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities."
+- **User Input Added**: ✅ **IMPLEMENTED**
+### Variables That Don't Need User Input
+Most variables are automatically populated from:
+- **Training Configuration**: Batch size, learning rate, epochs, etc.
+- **System Detection**: Hardware info, model size, etc.
+- **Auto-Generation**: Repository names, experiment names, etc.
+- **Training Results**: Loss values, perplexity, etc.
+## Implementation Changes
+### 1. Launch Script Updates (`launch.sh`)
+#### Added User Input Prompts
+```bash
+# Step 8.2: Author Information for Model Card
+print_step "Step 8.2: Author Information"
+echo "================================="
+print_info "This information will be used in the model card and citation."
+get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME
+print_info "Model description will be used in the model card and repository."
+get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities." MODEL_DESCRIPTION
+```
+#### Updated Configuration Summary
+```bash
+echo "  Author: $AUTHOR_NAME"
+```
+#### Updated Model Push Call
+```bash
+python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
+    --token "$HF_TOKEN" \
+    --trackio-url "$TRACKIO_URL" \
+    --experiment-name "$EXPERIMENT_NAME" \
+    --dataset-repo "$TRACKIO_DATASET_REPO" \
+    --author-name "$AUTHOR_NAME" \
+    --model-description "$MODEL_DESCRIPTION"
+```
+### 2. Push Script Updates (`scripts/model_tonic/push_to_huggingface.py`)
+#### Added Command Line Arguments
+```python
+parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
+parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
+```
+#### Updated Class Constructor
+```python
+def __init__(
+    self,
+    model_path: str,
+    repo_name: str,
+    token: Optional[str] = None,
+    private: bool = False,
+    trackio_url: Optional[str] = None,
+    experiment_name: Optional[str] = None,
+    dataset_repo: Optional[str] = None,
+    hf_token: Optional[str] = None,
+    author_name: Optional[str] = None,
+    model_description: Optional[str] = None
+):
+```
+#### Updated Model Card Generation
+```python
+variables = {
+    "model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
+    "model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
+    # ... other variables
+    "author_name": self.author_name or training_config.get('author_name', 'Your Name'),
+}
+```
+## User Experience Improvements
+### 1. **Interactive Prompts**
+- Users are now prompted for author name and model description
+- Default values are provided for convenience
+- Clear explanations of what each field is used for
+### 2. **Configuration Summary**
+- Author name is now displayed in the configuration summary
+- Users can review all settings before proceeding
+### 3. **Automatic Integration**
+- User inputs are automatically passed to the model card generation
+- No manual editing of scripts required
+## Template Variable Categories
+### Automatic Variables (No User Input Needed)
+- `repo_name` - Auto-generated from username and date
+- `base_model` - Always "HuggingFaceTB/SmolLM3-3B"
+- `training_config_type` - From user selection
+- `trainer_type` - From user selection
+- `batch_size`, `learning_rate`, `max_epochs` - From training config
+- `hardware_info` - Auto-detected
+- `experiment_name` - Auto-generated with timestamp
+- `trackio_url` - Auto-generated from space name
+- `dataset_repo` - Auto-generated
+- `training_loss`, `validation_loss`, `perplexity` - From training results
+### User Input Variables (Now Implemented)
+- `author_name` - ✅ **Added user prompt**
+- `model_description` - ✅ **Added user prompt**
+### Conditional Variables
+- `quantized_models` - Set automatically based on quantization choices
+- `dataset_sample_size` - Set based on training configuration type
+## Benefits of These Changes
+### 1. **Better Attribution**
+- Author names are properly captured and used in citations
+- Model cards include proper attribution
+### 2. **Customizable Descriptions**
+- Users can provide custom model descriptions
+- Better model documentation and discoverability
+### 3. **Improved User Experience**
+- No need to manually edit scripts
+- Interactive prompts with helpful defaults
+- Clear feedback on what information is being collected
+### 4. **Consistent Documentation**
+- All model cards will have proper author information
+- Standardized model descriptions
+- Better integration with Hugging Face Hub
+## Future Enhancements
+### Potential Additional User Inputs
+1. **License Selection** - Allow users to choose model license
+2. **Model Tags** - Custom tags for better discoverability
+3. **Usage Examples** - Custom usage examples for specific use cases
+4. **Limitations Description** - Custom limitations based on training data
+### Template Improvements
+1. **Dynamic License** - Support for different license types
+2. **Custom Tags** - User-defined model tags
+3. **Usage Scenarios** - Template sections for different use cases
+## Testing
+The changes have been tested to ensure:
+- ✅ Author name is properly passed to model card generation
+- ✅ Model description is properly passed to model card generation
+- ✅ Default values work correctly
+- ✅ Configuration summary displays new fields
+- ✅ Model push script accepts new parameters
+## Conclusion
+The analysis identified that the model card template had two key variables (`author_name` and `model_description`) that would benefit from user input. These have been successfully implemented with:
+1. **Interactive prompts** in the launch script
+2. **Command line arguments** in the push script
+3. **Proper integration** with the model card generator
+4. **User-friendly defaults** and clear explanations
+This improves the overall user experience and ensures that model cards have proper attribution and descriptions.

docs/TRACKIO_TRL_FIX.md ADDED Viewed

	@@ -0,0 +1,146 @@

+# Trackio TRL Compatibility Fix
+## Problem Description
+The training was failing with the error:
+```
+ERROR:trainer:Training failed: module 'trackio' has no attribute 'init'
+```
+This error occurred because the TRL library (specifically SFTTrainer) expects a `trackio` module with specific functions:
+- `init()` - Initialize experiment
+- `log()` - Log metrics
+- `finish()` - Finish experiment
+However, our custom monitoring implementation didn't provide this interface.
+## Solution Implementation
+### 1. Created Trackio Module Interface (`src/trackio.py`)
+Created a trackio module that provides the exact interface expected by TRL:
+```python
+def init(project_name: str, experiment_name: Optional[str] = None, **kwargs) -> str:
+    """Initialize trackio experiment (TRL interface)"""
+def log(metrics: Dict[str, Any], step: Optional[int] = None, **kwargs):
+    """Log metrics to trackio (TRL interface)"""
+def finish():
+    """Finish trackio experiment (TRL interface)"""
+```
+### 2. Global Trackio Module (`trackio.py`)
+Created a root-level `trackio.py` file that imports from our custom implementation:
+```python
+from src.trackio import (
+    init, log, finish, log_config, log_checkpoint,
+    log_evaluation_results, get_experiment_url, is_available, get_monitor
+)
+```
+This makes the trackio module available globally for TRL to import.
+### 3. Updated Trainer Integration (`src/trainer.py`)
+Modified the trainer to properly initialize trackio before creating SFTTrainer:
+```python
+# Initialize trackio for TRL compatibility
+try:
+    import trackio
+    experiment_id = trackio.init(
+        project_name=self.config.experiment_name,
+        experiment_name=self.config.experiment_name,
+        trackio_url=getattr(self.config, 'trackio_url', None),
+        trackio_token=getattr(self.config, 'trackio_token', None),
+        hf_token=getattr(self.config, 'hf_token', None),
+        dataset_repo=getattr(self.config, 'dataset_repo', None)
+    )
+    logger.info(f"Trackio initialized with experiment ID: {experiment_id}")
+except Exception as e:
+    logger.warning(f"Failed to initialize trackio: {e}")
+    logger.info("Continuing without trackio integration")
+```
+### 4. Proper Cleanup
+Added trackio.finish() calls in both success and error scenarios:
+```python
+# Finish trackio experiment
+try:
+    import trackio
+    trackio.finish()
+    logger.info("Trackio experiment finished")
+except Exception as e:
+    logger.warning(f"Failed to finish trackio experiment: {e}")
+```
+## Integration with Custom Monitoring
+The trackio module integrates seamlessly with our existing monitoring system:
+- Uses `SmolLM3Monitor` for actual monitoring functionality
+- Provides TRL-compatible interface on top
+- Maintains all existing features (HF Datasets, Trackio Space, etc.)
+- Graceful fallback when Trackio Space is not accessible
+## Testing
+Created comprehensive test suite (`tests/test_trackio_trl_fix.py`) that verifies:
+1. **Interface Compatibility**: All required functions exist
+2. **TRL Compatibility**: Function signatures match expectations
+3. **Monitoring Integration**: Works with our custom monitoring system
+Test results:
+```
+✅ Successfully imported trackio module
+✅ Found required function: init
+✅ Found required function: log
+✅ Found required function: finish
+✅ Trackio initialization successful
+✅ Trackio logging successful
+✅ Trackio finish successful
+✅ TRL compatibility test passed
+✅ Monitor integration working
+```
+## Benefits
+1. **Resolves Training Error**: Fixes the "module trackio has no attribute init" error
+2. **Maintains Functionality**: All existing monitoring features continue to work
+3. **TRL Compatibility**: SFTTrainer can now use trackio for logging
+4. **Graceful Fallback**: Continues training even if trackio initialization fails
+5. **Future-Proof**: Easy to extend with additional TRL-compatible functions
+## Usage
+The fix is transparent to users. Training will now work with SFTTrainer and automatically:
+1. Initialize trackio when SFTTrainer is created
+2. Log metrics during training
+3. Finish the experiment when training completes
+4. Fall back gracefully if trackio is not available
+## Files Modified
+- `src/trackio.py` - New trackio module interface
+- `trackio.py` - Global trackio module for TRL
+- `src/trainer.py` - Updated trainer integration
+- `src/__init__.py` - Package exports
+- `tests/test_trackio_trl_fix.py` - Test suite
+## Verification
+To verify the fix works:
+```bash
+python tests/test_trackio_trl_fix.py
+```
+This should show all tests passing and confirm that the trackio module provides the interface expected by TRL library.

launch.sh CHANGED Viewed

@@ -493,6 +493,7 @@ echo "  Epochs: $MAX_EPOCHS"
 echo "  Batch Size: $BATCH_SIZE"
 echo "  Learning Rate: $LEARNING_RATE"
 echo "  Model Repo: $REPO_NAME (auto-generated)"
 echo "  Trackio Space: $TRACKIO_URL"
 echo "  HF Dataset: $TRACKIO_DATASET_REPO"
 echo ""
@@ -609,6 +610,16 @@ else
     exit 1
 fi
 # Step 9: Deploy Trackio Space (automated)
 print_step "Step 9: Deploying Trackio Space"
 echo "==================================="
@@ -729,7 +740,9 @@ python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME
     --token "$HF_TOKEN" \
     --trackio-url "$TRACKIO_URL" \
     --experiment-name "$EXPERIMENT_NAME" \
-    --dataset-repo "$TRACKIO_DATASET_REPO"
 # Step 16.5: Quantization Options
 print_step "Step 16.5: Model Quantization Options"

 echo "  Batch Size: $BATCH_SIZE"
 echo "  Learning Rate: $LEARNING_RATE"
 echo "  Model Repo: $REPO_NAME (auto-generated)"
+echo "  Author: $AUTHOR_NAME"
 echo "  Trackio Space: $TRACKIO_URL"
 echo "  HF Dataset: $TRACKIO_DATASET_REPO"
 echo ""
     exit 1
 fi
+# Step 8.2: Author Information for Model Card
+print_step "Step 8.2: Author Information"
+echo "================================="
+print_info "This information will be used in the model card and citation."
+get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME
+print_info "Model description will be used in the model card and repository."
+get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved french language text generation and conversation capabilities." MODEL_DESCRIPTION
 # Step 9: Deploy Trackio Space (automated)
 print_step "Step 9: Deploying Trackio Space"
 echo "==================================="
     --token "$HF_TOKEN" \
     --trackio-url "$TRACKIO_URL" \
     --experiment-name "$EXPERIMENT_NAME" \
+    --dataset-repo "$TRACKIO_DATASET_REPO" \
+    --author-name "$AUTHOR_NAME" \
+    --model-description "$MODEL_DESCRIPTION"
 # Step 16.5: Quantization Options
 print_step "Step 16.5: Model Quantization Options"

scripts/model_tonic/push_to_huggingface.py CHANGED Viewed

@@ -46,7 +46,9 @@ class HuggingFacePusher:
         trackio_url: Optional[str] = None,
         experiment_name: Optional[str] = None,
         dataset_repo: Optional[str] = None,
-        hf_token: Optional[str] = None
     ):
         self.model_path = Path(model_path)
         self.repo_name = repo_name
@@ -54,6 +56,8 @@ class HuggingFacePusher:
         self.private = private
         self.trackio_url = trackio_url
         self.experiment_name = experiment_name
         # HF Datasets configuration
         self.dataset_repo = dataset_repo or os.getenv('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
@@ -131,7 +135,7 @@ class HuggingFacePusher:
             # Create variables for the template
             variables = {
                 "model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
-                "model_description": "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
                 "repo_name": self.repo_name,
                 "base_model": "HuggingFaceTB/SmolLM3-3B",
                 "dataset_name": training_config.get('dataset_name', 'OpenHermes-FR'),
@@ -148,7 +152,7 @@ class HuggingFacePusher:
                 "dataset_repo": self.dataset_repo,
                 "dataset_size": training_config.get('dataset_size', '~80K samples'),
                 "dataset_format": training_config.get('dataset_format', 'Chat format'),
-                "author_name": training_config.get('author_name', 'Your Name'),
                 "model_name_slug": self.repo_name.split('/')[-1].lower().replace('-', '_'),
                 "quantized_models": False,  # Will be updated if quantized models are added
                 "dataset_sample_size": training_config.get('dataset_sample_size'),
@@ -522,6 +526,8 @@ def parse_args():
     parser.add_argument('--trackio-url', type=str, default=None, help='Trackio Space URL for logging')
     parser.add_argument('--experiment-name', type=str, default=None, help='Experiment name for Trackio')
     parser.add_argument('--dataset-repo', type=str, default=None, help='HF Dataset repository for experiment storage')
     return parser.parse_args()
@@ -547,7 +553,9 @@ def main():
             trackio_url=args.trackio_url,
             experiment_name=args.experiment_name,
             dataset_repo=args.dataset_repo,
-            hf_token=args.hf_token
         )
         # Push model

         trackio_url: Optional[str] = None,
         experiment_name: Optional[str] = None,
         dataset_repo: Optional[str] = None,
+        hf_token: Optional[str] = None,
+        author_name: Optional[str] = None,
+        model_description: Optional[str] = None
     ):
         self.model_path = Path(model_path)
         self.repo_name = repo_name
         self.private = private
         self.trackio_url = trackio_url
         self.experiment_name = experiment_name
+        self.author_name = author_name
+        self.model_description = model_description
         # HF Datasets configuration
         self.dataset_repo = dataset_repo or os.getenv('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
             # Create variables for the template
             variables = {
                 "model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
+                "model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
                 "repo_name": self.repo_name,
                 "base_model": "HuggingFaceTB/SmolLM3-3B",
                 "dataset_name": training_config.get('dataset_name', 'OpenHermes-FR'),
                 "dataset_repo": self.dataset_repo,
                 "dataset_size": training_config.get('dataset_size', '~80K samples'),
                 "dataset_format": training_config.get('dataset_format', 'Chat format'),
+                "author_name": self.author_name or training_config.get('author_name', 'Your Name'),
                 "model_name_slug": self.repo_name.split('/')[-1].lower().replace('-', '_'),
                 "quantized_models": False,  # Will be updated if quantized models are added
                 "dataset_sample_size": training_config.get('dataset_sample_size'),
     parser.add_argument('--trackio-url', type=str, default=None, help='Trackio Space URL for logging')
     parser.add_argument('--experiment-name', type=str, default=None, help='Experiment name for Trackio')
     parser.add_argument('--dataset-repo', type=str, default=None, help='HF Dataset repository for experiment storage')
+    parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
+    parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
     return parser.parse_args()
             trackio_url=args.trackio_url,
             experiment_name=args.experiment_name,
             dataset_repo=args.dataset_repo,
+            hf_token=args.hf_token,
+            author_name=args.author_name,
+            model_description=args.model_description
         )
         # Push model

src/__init__.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""
+SmolLM3 Fine-tuning Pipeline
+Core training and monitoring modules
+"""
+from .config import SmolLM3Config
+from .data import SmolLM3Dataset
+from .model import SmolLM3Model
+from .monitoring import SmolLM3Monitor, create_monitor_from_config
+from .train import SmolLM3Trainer
+from .trainer import SmolLM3Trainer as Trainer
+from .trackio import init, log, finish, log_config, log_checkpoint, log_evaluation_results
+__all__ = [
+    'SmolLM3Config',
+    'SmolLM3Dataset',
+    'SmolLM3Model',
+    'SmolLM3Monitor',
+    'create_monitor_from_config',
+    'SmolLM3Trainer',
+    'Trainer',
+    # Trackio interface
+    'init',
+    'log',
+    'finish',
+    'log_config',
+    'log_checkpoint',
+    'log_evaluation_results'
+]

src/trackio.py ADDED Viewed

	@@ -0,0 +1,199 @@

+"""
+Trackio Module Interface for TRL Library
+Provides the interface expected by TRL library while integrating with our custom monitoring system
+"""
+import os
+import logging
+from typing import Dict, Any, Optional
+from datetime import datetime
+# Import our custom monitoring
+from monitoring import SmolLM3Monitor
+logger = logging.getLogger(__name__)
+# Global monitor instance
+_monitor = None
+def init(
+    project_name: str,
+    experiment_name: Optional[str] = None,
+    **kwargs
+) -> str:
+    """
+    Initialize trackio experiment (TRL interface)
+    Args:
+        project_name: Name of the project
+        experiment_name: Name of the experiment (optional)
+        **kwargs: Additional configuration parameters
+    Returns:
+        Experiment ID
+    """
+    global _monitor
+    try:
+        # Extract configuration from kwargs
+        trackio_url = kwargs.get('trackio_url') or os.environ.get('TRACKIO_URL')
+        trackio_token = kwargs.get('trackio_token') or os.environ.get('TRACKIO_TOKEN')
+        hf_token = kwargs.get('hf_token') or os.environ.get('HF_TOKEN')
+        dataset_repo = kwargs.get('dataset_repo') or os.environ.get('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
+        # Use experiment_name if provided, otherwise use project_name
+        exp_name = experiment_name or project_name
+        # Create monitor instance
+        _monitor = SmolLM3Monitor(
+            experiment_name=exp_name,
+            trackio_url=trackio_url,
+            trackio_token=trackio_token,
+            enable_tracking=True,
+            log_artifacts=True,
+            log_metrics=True,
+            log_config=True,
+            hf_token=hf_token,
+            dataset_repo=dataset_repo
+        )
+        # Generate experiment ID
+        experiment_id = f"trl_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        _monitor.experiment_id = experiment_id
+        logger.info(f"Trackio initialized for experiment: {exp_name}")
+        logger.info(f"Experiment ID: {experiment_id}")
+        return experiment_id
+    except Exception as e:
+        logger.error(f"Failed to initialize trackio: {e}")
+        # Return a fallback experiment ID
+        return f"trl_fallback_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+def log(
+    metrics: Dict[str, Any],
+    step: Optional[int] = None,
+    **kwargs
+):
+    """
+    Log metrics to trackio (TRL interface)
+    Args:
+        metrics: Dictionary of metrics to log
+        step: Current training step
+        **kwargs: Additional parameters
+    """
+    global _monitor
+    try:
+        if _monitor is None:
+            logger.warning("Trackio not initialized, skipping log")
+            return
+        # Log metrics using our custom monitor
+        _monitor.log_metrics(metrics, step)
+        # Also log system metrics if available
+        _monitor.log_system_metrics(step)
+    except Exception as e:
+        logger.error(f"Failed to log metrics: {e}")
+def finish():
+    """
+    Finish trackio experiment (TRL interface)
+    """
+    global _monitor
+    try:
+        if _monitor is None:
+            logger.warning("Trackio not initialized, skipping finish")
+            return
+        # Close the monitoring session
+        _monitor.close()
+        logger.info("Trackio experiment finished")
+    except Exception as e:
+        logger.error(f"Failed to finish trackio experiment: {e}")
+def log_config(config: Dict[str, Any]):
+    """
+    Log configuration to trackio (TRL interface)
+    Args:
+        config: Configuration dictionary to log
+    """
+    global _monitor
+    try:
+        if _monitor is None:
+            logger.warning("Trackio not initialized, skipping config log")
+            return
+        # Log configuration using our custom monitor
+        _monitor.log_configuration(config)
+    except Exception as e:
+        logger.error(f"Failed to log config: {e}")
+def log_checkpoint(checkpoint_path: str, step: Optional[int] = None):
+    """
+    Log checkpoint to trackio (TRL interface)
+    Args:
+        checkpoint_path: Path to the checkpoint file
+        step: Current training step
+    """
+    global _monitor
+    try:
+        if _monitor is None:
+            logger.warning("Trackio not initialized, skipping checkpoint log")
+            return
+        # Log checkpoint using our custom monitor
+        _monitor.log_model_checkpoint(checkpoint_path, step)
+    except Exception as e:
+        logger.error(f"Failed to log checkpoint: {e}")
+def log_evaluation_results(results: Dict[str, Any], step: Optional[int] = None):
+    """
+    Log evaluation results to trackio (TRL interface)
+    Args:
+        results: Evaluation results dictionary
+        step: Current training step
+    """
+    global _monitor
+    try:
+        if _monitor is None:
+            logger.warning("Trackio not initialized, skipping evaluation log")
+            return
+        # Log evaluation results using our custom monitor
+        _monitor.log_evaluation_results(results, step)
+    except Exception as e:
+        logger.error(f"Failed to log evaluation results: {e}")
+# Additional utility functions for TRL compatibility
+def get_experiment_url() -> Optional[str]:
+    """Get the URL to view the experiment"""
+    global _monitor
+    if _monitor is not None:
+        return _monitor.get_experiment_url()
+    return None
+def is_available() -> bool:
+    """Check if trackio is available and initialized"""
+    return _monitor is not None and _monitor.enable_tracking
+def get_monitor():
+    """Get the current monitor instance (for advanced usage)"""
+    return _monitor

src/trainer.py CHANGED Viewed

@@ -135,6 +135,23 @@ class SmolLM3Trainer:
         logger.info("Total callbacks: %d", len(callbacks))
         # Try SFTTrainer first (better for instruction tuning)
         logger.info("Creating SFTTrainer with training arguments...")
         logger.info("Training args type: %s", type(training_args))
@@ -235,6 +252,14 @@ class SmolLM3Trainer:
                 self.monitor.log_training_summary(summary)
                 self.monitor.close()
             logger.info("Training completed successfully!")
             logger.info("Training metrics: %s", train_result.metrics)
@@ -243,6 +268,14 @@ class SmolLM3Trainer:
             # Close monitoring on error
             if self.monitor and self.monitor.enable_tracking:
                 self.monitor.close()
             raise
     def evaluate(self):

         logger.info("Total callbacks: %d", len(callbacks))
+        # Initialize trackio for TRL compatibility
+        try:
+            import trackio
+            # Initialize trackio with our configuration
+            experiment_id = trackio.init(
+                project_name=self.config.experiment_name,
+                experiment_name=self.config.experiment_name,
+                trackio_url=getattr(self.config, 'trackio_url', None),
+                trackio_token=getattr(self.config, 'trackio_token', None),
+                hf_token=getattr(self.config, 'hf_token', None),
+                dataset_repo=getattr(self.config, 'dataset_repo', None)
+            )
+            logger.info(f"Trackio initialized with experiment ID: {experiment_id}")
+        except Exception as e:
+            logger.warning(f"Failed to initialize trackio: {e}")
+            logger.info("Continuing without trackio integration")
         # Try SFTTrainer first (better for instruction tuning)
         logger.info("Creating SFTTrainer with training arguments...")
         logger.info("Training args type: %s", type(training_args))
                 self.monitor.log_training_summary(summary)
                 self.monitor.close()
+            # Finish trackio experiment
+            try:
+                import trackio
+                trackio.finish()
+                logger.info("Trackio experiment finished")
+            except Exception as e:
+                logger.warning(f"Failed to finish trackio experiment: {e}")
             logger.info("Training completed successfully!")
             logger.info("Training metrics: %s", train_result.metrics)
             # Close monitoring on error
             if self.monitor and self.monitor.enable_tracking:
                 self.monitor.close()
+            # Finish trackio experiment on error
+            try:
+                import trackio
+                trackio.finish()
+            except Exception as finish_error:
+                logger.warning(f"Failed to finish trackio experiment on error: {finish_error}")
             raise
     def evaluate(self):

setup_launch.py → tests/setup_launch.py RENAMED Viewed

File without changes

tests/test_trackio_trl_fix.py ADDED Viewed

	@@ -0,0 +1,153 @@

+#!/usr/bin/env python3
+"""
+Test script to verify Trackio TRL compatibility fix
+Tests that our trackio module provides the interface expected by TRL library
+"""
+import sys
+import os
+import logging
+# Add src to path
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
+def test_trackio_interface():
+    """Test that trackio module provides the expected interface"""
+    print("🔍 Testing Trackio TRL Interface")
+    try:
+        # Test importing trackio
+        import trackio
+        print("✅ Successfully imported trackio module")
+        # Test that required functions exist
+        required_functions = ['init', 'log', 'finish']
+        for func_name in required_functions:
+            if hasattr(trackio, func_name):
+                print(f"✅ Found required function: {func_name}")
+            else:
+                print(f"❌ Missing required function: {func_name}")
+                return False
+        # Test initialization
+        experiment_id = trackio.init(
+            project_name="test_project",
+            experiment_name="test_experiment",
+            trackio_url="https://test.hf.space",
+            dataset_repo="test/trackio-experiments"
+        )
+        print(f"✅ Trackio initialization successful: {experiment_id}")
+        # Test logging
+        metrics = {'loss': 0.5, 'learning_rate': 1e-4}
+        trackio.log(metrics, step=1)
+        print("✅ Trackio logging successful")
+        # Test finishing
+        trackio.finish()
+        print("✅ Trackio finish successful")
+        return True
+    except Exception as e:
+        print(f"❌ Trackio interface test failed: {e}")
+        return False
+def test_trl_compatibility():
+    """Test that our trackio module is compatible with TRL expectations"""
+    print("\n🔍 Testing TRL Compatibility")
+    try:
+        # Simulate what TRL would do
+        import trackio
+        # TRL expects these functions to be available
+        assert hasattr(trackio, 'init'), "trackio.init not found"
+        assert hasattr(trackio, 'log'), "trackio.log not found"
+        assert hasattr(trackio, 'finish'), "trackio.finish not found"
+        # Test function signatures
+        import inspect
+        # Check init signature
+        init_sig = inspect.signature(trackio.init)
+        print(f"✅ init signature: {init_sig}")
+        # Check log signature
+        log_sig = inspect.signature(trackio.log)
+        print(f"✅ log signature: {log_sig}")
+        # Check finish signature
+        finish_sig = inspect.signature(trackio.finish)
+        print(f"✅ finish signature: {finish_sig}")
+        print("✅ TRL compatibility test passed")
+        return True
+    except Exception as e:
+        print(f"❌ TRL compatibility test failed: {e}")
+        return False
+def test_monitoring_integration():
+    """Test that our trackio module integrates with our monitoring system"""
+    print("\n🔍 Testing Monitoring Integration")
+    try:
+        import trackio
+        # Test that we can get the monitor
+        monitor = trackio.get_monitor()
+        if monitor is not None:
+            print("✅ Monitor integration working")
+        else:
+            print("⚠️ Monitor not available (this is normal if not initialized)")
+        # Test availability check
+        is_avail = trackio.is_available()
+        print(f"✅ Trackio availability check: {is_avail}")
+        return True
+    except Exception as e:
+        print(f"❌ Monitoring integration test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    print("🚀 Testing Trackio TRL Fix")
+    print("=" * 50)
+    tests = [
+        test_trackio_interface,
+        test_trl_compatibility,
+        test_monitoring_integration
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        try:
+            if test():
+                passed += 1
+        except Exception as e:
+            print(f"❌ Test {test.__name__} failed with exception: {e}")
+    print("\n" + "=" * 50)
+    print(f"Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("✅ All tests passed! Trackio TRL fix is working correctly.")
+        print("\nThe trackio module now provides the interface expected by TRL library:")
+        print("- init(): Initialize experiment")
+        print("- log(): Log metrics")
+        print("- finish(): Finish experiment")
+        print("\nThis should resolve the 'module trackio has no attribute init' error.")
+    else:
+        print("❌ Some tests failed. Please check the implementation.")
+        return 1
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

trackio.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""
+Trackio Module for TRL Library Compatibility
+This module provides the interface expected by TRL library while using our custom monitoring system
+"""
+# Import all functions from our custom trackio implementation
+from src.trackio import (
+    init,
+    log,
+    finish,
+    log_config,
+    log_checkpoint,
+    log_evaluation_results,
+    get_experiment_url,
+    is_available,
+    get_monitor
+)
+# Make all functions available at module level
+__all__ = [
+    'init',
+    'log',
+    'finish',
+    'log_config',
+    'log_checkpoint',
+    'log_evaluation_results',
+    'get_experiment_url',
+    'is_available',
+    'get_monitor'
+]