Spaces:
Running
Running
adds monkey patch for trackio monitoring in torch and readme creator improvements
Browse files- docs/MODEL_CARD_USER_INPUT_ANALYSIS.md +233 -0
- docs/TRACKIO_TRL_FIX.md +146 -0
- launch.sh +14 -1
- scripts/model_tonic/push_to_huggingface.py +12 -4
- src/__init__.py +29 -0
- src/trackio.py +199 -0
- src/trainer.py +33 -0
- setup_launch.py β tests/setup_launch.py +0 -0
- tests/test_trackio_trl_fix.py +153 -0
- trackio.py +30 -0
docs/MODEL_CARD_USER_INPUT_ANALYSIS.md
ADDED
@@ -0,0 +1,233 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Card User Input Analysis
|
2 |
+
|
3 |
+
## Overview
|
4 |
+
|
5 |
+
This document analyzes the interaction between the model card template (`templates/model_card.md`), the model card generator (`scripts/model_tonic/generate_model_card.py`), and the launch script (`launch.sh`) to identify variables that require user input and improve the user experience.
|
6 |
+
|
7 |
+
## Template Variables Analysis
|
8 |
+
|
9 |
+
### Variables in `templates/model_card.md`
|
10 |
+
|
11 |
+
The model card template uses the following variables that can be populated with user input:
|
12 |
+
|
13 |
+
#### Core Model Information
|
14 |
+
- `{{model_name}}` - Display name of the model
|
15 |
+
- `{{model_description}}` - Brief description of the model
|
16 |
+
- `{{repo_name}}` - Hugging Face repository name
|
17 |
+
- `{{base_model}}` - Base model used for fine-tuning
|
18 |
+
|
19 |
+
#### Training Configuration
|
20 |
+
- `{{training_config_type}}` - Type of training configuration used
|
21 |
+
- `{{trainer_type}}` - Type of trainer (SFT, DPO, etc.)
|
22 |
+
- `{{batch_size}}` - Training batch size
|
23 |
+
- `{{gradient_accumulation_steps}}` - Gradient accumulation steps
|
24 |
+
- `{{learning_rate}}` - Learning rate used
|
25 |
+
- `{{max_epochs}}` - Maximum number of epochs
|
26 |
+
- `{{max_seq_length}}` - Maximum sequence length
|
27 |
+
|
28 |
+
#### Dataset Information
|
29 |
+
- `{{dataset_name}}` - Name of the dataset used
|
30 |
+
- `{{dataset_size}}` - Size of the dataset
|
31 |
+
- `{{dataset_format}}` - Format of the dataset
|
32 |
+
- `{{dataset_sample_size}}` - Sample size (for lightweight configs)
|
33 |
+
|
34 |
+
#### Training Results
|
35 |
+
- `{{training_loss}}` - Final training loss
|
36 |
+
- `{{validation_loss}}` - Final validation loss
|
37 |
+
- `{{perplexity}}` - Model perplexity
|
38 |
+
|
39 |
+
#### Infrastructure
|
40 |
+
- `{{hardware_info}}` - Hardware used for training
|
41 |
+
- `{{experiment_name}}` - Name of the experiment
|
42 |
+
- `{{trackio_url}}` - Trackio monitoring URL
|
43 |
+
- `{{dataset_repo}}` - HF Dataset repository
|
44 |
+
|
45 |
+
#### Author Information
|
46 |
+
- `{{author_name}}` - Author name for citations and attribution
|
47 |
+
- `{{model_name_slug}}` - URL-friendly model name
|
48 |
+
|
49 |
+
#### Quantization
|
50 |
+
- `{{quantized_models}}` - Boolean indicating if quantized models exist
|
51 |
+
|
52 |
+
## User Input Requirements
|
53 |
+
|
54 |
+
### Previously Missing User Inputs
|
55 |
+
|
56 |
+
#### 1. **Author Name** (`author_name`)
|
57 |
+
- **Purpose**: Used in model card metadata and citations
|
58 |
+
- **Template Usage**: `{{#if author_name}}author: {{author_name}}{{/if}}`
|
59 |
+
- **Citation Usage**: `author={{{author_name}}}`
|
60 |
+
- **Default**: "Your Name"
|
61 |
+
- **User Input Added**: β
**IMPLEMENTED**
|
62 |
+
|
63 |
+
#### 2. **Model Description** (`model_description`)
|
64 |
+
- **Purpose**: Brief description of the model's capabilities
|
65 |
+
- **Template Usage**: `{{model_description}}`
|
66 |
+
- **Default**: "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities."
|
67 |
+
- **User Input Added**: β
**IMPLEMENTED**
|
68 |
+
|
69 |
+
### Variables That Don't Need User Input
|
70 |
+
|
71 |
+
Most variables are automatically populated from:
|
72 |
+
- **Training Configuration**: Batch size, learning rate, epochs, etc.
|
73 |
+
- **System Detection**: Hardware info, model size, etc.
|
74 |
+
- **Auto-Generation**: Repository names, experiment names, etc.
|
75 |
+
- **Training Results**: Loss values, perplexity, etc.
|
76 |
+
|
77 |
+
## Implementation Changes
|
78 |
+
|
79 |
+
### 1. Launch Script Updates (`launch.sh`)
|
80 |
+
|
81 |
+
#### Added User Input Prompts
|
82 |
+
```bash
|
83 |
+
# Step 8.2: Author Information for Model Card
|
84 |
+
print_step "Step 8.2: Author Information"
|
85 |
+
echo "================================="
|
86 |
+
|
87 |
+
print_info "This information will be used in the model card and citation."
|
88 |
+
get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME
|
89 |
+
|
90 |
+
print_info "Model description will be used in the model card and repository."
|
91 |
+
get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities." MODEL_DESCRIPTION
|
92 |
+
```
|
93 |
+
|
94 |
+
#### Updated Configuration Summary
|
95 |
+
```bash
|
96 |
+
echo " Author: $AUTHOR_NAME"
|
97 |
+
```
|
98 |
+
|
99 |
+
#### Updated Model Push Call
|
100 |
+
```bash
|
101 |
+
python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
|
102 |
+
--token "$HF_TOKEN" \
|
103 |
+
--trackio-url "$TRACKIO_URL" \
|
104 |
+
--experiment-name "$EXPERIMENT_NAME" \
|
105 |
+
--dataset-repo "$TRACKIO_DATASET_REPO" \
|
106 |
+
--author-name "$AUTHOR_NAME" \
|
107 |
+
--model-description "$MODEL_DESCRIPTION"
|
108 |
+
```
|
109 |
+
|
110 |
+
### 2. Push Script Updates (`scripts/model_tonic/push_to_huggingface.py`)
|
111 |
+
|
112 |
+
#### Added Command Line Arguments
|
113 |
+
```python
|
114 |
+
parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
|
115 |
+
parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
|
116 |
+
```
|
117 |
+
|
118 |
+
#### Updated Class Constructor
|
119 |
+
```python
|
120 |
+
def __init__(
|
121 |
+
self,
|
122 |
+
model_path: str,
|
123 |
+
repo_name: str,
|
124 |
+
token: Optional[str] = None,
|
125 |
+
private: bool = False,
|
126 |
+
trackio_url: Optional[str] = None,
|
127 |
+
experiment_name: Optional[str] = None,
|
128 |
+
dataset_repo: Optional[str] = None,
|
129 |
+
hf_token: Optional[str] = None,
|
130 |
+
author_name: Optional[str] = None,
|
131 |
+
model_description: Optional[str] = None
|
132 |
+
):
|
133 |
+
```
|
134 |
+
|
135 |
+
#### Updated Model Card Generation
|
136 |
+
```python
|
137 |
+
variables = {
|
138 |
+
"model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
|
139 |
+
"model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
|
140 |
+
# ... other variables
|
141 |
+
"author_name": self.author_name or training_config.get('author_name', 'Your Name'),
|
142 |
+
}
|
143 |
+
```
|
144 |
+
|
145 |
+
## User Experience Improvements
|
146 |
+
|
147 |
+
### 1. **Interactive Prompts**
|
148 |
+
- Users are now prompted for author name and model description
|
149 |
+
- Default values are provided for convenience
|
150 |
+
- Clear explanations of what each field is used for
|
151 |
+
|
152 |
+
### 2. **Configuration Summary**
|
153 |
+
- Author name is now displayed in the configuration summary
|
154 |
+
- Users can review all settings before proceeding
|
155 |
+
|
156 |
+
### 3. **Automatic Integration**
|
157 |
+
- User inputs are automatically passed to the model card generation
|
158 |
+
- No manual editing of scripts required
|
159 |
+
|
160 |
+
## Template Variable Categories
|
161 |
+
|
162 |
+
### Automatic Variables (No User Input Needed)
|
163 |
+
- `repo_name` - Auto-generated from username and date
|
164 |
+
- `base_model` - Always "HuggingFaceTB/SmolLM3-3B"
|
165 |
+
- `training_config_type` - From user selection
|
166 |
+
- `trainer_type` - From user selection
|
167 |
+
- `batch_size`, `learning_rate`, `max_epochs` - From training config
|
168 |
+
- `hardware_info` - Auto-detected
|
169 |
+
- `experiment_name` - Auto-generated with timestamp
|
170 |
+
- `trackio_url` - Auto-generated from space name
|
171 |
+
- `dataset_repo` - Auto-generated
|
172 |
+
- `training_loss`, `validation_loss`, `perplexity` - From training results
|
173 |
+
|
174 |
+
### User Input Variables (Now Implemented)
|
175 |
+
- `author_name` - β
**Added user prompt**
|
176 |
+
- `model_description` - β
**Added user prompt**
|
177 |
+
|
178 |
+
### Conditional Variables
|
179 |
+
- `quantized_models` - Set automatically based on quantization choices
|
180 |
+
- `dataset_sample_size` - Set based on training configuration type
|
181 |
+
|
182 |
+
## Benefits of These Changes
|
183 |
+
|
184 |
+
### 1. **Better Attribution**
|
185 |
+
- Author names are properly captured and used in citations
|
186 |
+
- Model cards include proper attribution
|
187 |
+
|
188 |
+
### 2. **Customizable Descriptions**
|
189 |
+
- Users can provide custom model descriptions
|
190 |
+
- Better model documentation and discoverability
|
191 |
+
|
192 |
+
### 3. **Improved User Experience**
|
193 |
+
- No need to manually edit scripts
|
194 |
+
- Interactive prompts with helpful defaults
|
195 |
+
- Clear feedback on what information is being collected
|
196 |
+
|
197 |
+
### 4. **Consistent Documentation**
|
198 |
+
- All model cards will have proper author information
|
199 |
+
- Standardized model descriptions
|
200 |
+
- Better integration with Hugging Face Hub
|
201 |
+
|
202 |
+
## Future Enhancements
|
203 |
+
|
204 |
+
### Potential Additional User Inputs
|
205 |
+
1. **License Selection** - Allow users to choose model license
|
206 |
+
2. **Model Tags** - Custom tags for better discoverability
|
207 |
+
3. **Usage Examples** - Custom usage examples for specific use cases
|
208 |
+
4. **Limitations Description** - Custom limitations based on training data
|
209 |
+
|
210 |
+
### Template Improvements
|
211 |
+
1. **Dynamic License** - Support for different license types
|
212 |
+
2. **Custom Tags** - User-defined model tags
|
213 |
+
3. **Usage Scenarios** - Template sections for different use cases
|
214 |
+
|
215 |
+
## Testing
|
216 |
+
|
217 |
+
The changes have been tested to ensure:
|
218 |
+
- β
Author name is properly passed to model card generation
|
219 |
+
- β
Model description is properly passed to model card generation
|
220 |
+
- β
Default values work correctly
|
221 |
+
- β
Configuration summary displays new fields
|
222 |
+
- β
Model push script accepts new parameters
|
223 |
+
|
224 |
+
## Conclusion
|
225 |
+
|
226 |
+
The analysis identified that the model card template had two key variables (`author_name` and `model_description`) that would benefit from user input. These have been successfully implemented with:
|
227 |
+
|
228 |
+
1. **Interactive prompts** in the launch script
|
229 |
+
2. **Command line arguments** in the push script
|
230 |
+
3. **Proper integration** with the model card generator
|
231 |
+
4. **User-friendly defaults** and clear explanations
|
232 |
+
|
233 |
+
This improves the overall user experience and ensures that model cards have proper attribution and descriptions.
|
docs/TRACKIO_TRL_FIX.md
ADDED
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Trackio TRL Compatibility Fix
|
2 |
+
|
3 |
+
## Problem Description
|
4 |
+
|
5 |
+
The training was failing with the error:
|
6 |
+
```
|
7 |
+
ERROR:trainer:Training failed: module 'trackio' has no attribute 'init'
|
8 |
+
```
|
9 |
+
|
10 |
+
This error occurred because the TRL library (specifically SFTTrainer) expects a `trackio` module with specific functions:
|
11 |
+
- `init()` - Initialize experiment
|
12 |
+
- `log()` - Log metrics
|
13 |
+
- `finish()` - Finish experiment
|
14 |
+
|
15 |
+
However, our custom monitoring implementation didn't provide this interface.
|
16 |
+
|
17 |
+
## Solution Implementation
|
18 |
+
|
19 |
+
### 1. Created Trackio Module Interface (`src/trackio.py`)
|
20 |
+
|
21 |
+
Created a trackio module that provides the exact interface expected by TRL:
|
22 |
+
|
23 |
+
```python
|
24 |
+
def init(project_name: str, experiment_name: Optional[str] = None, **kwargs) -> str:
|
25 |
+
"""Initialize trackio experiment (TRL interface)"""
|
26 |
+
|
27 |
+
def log(metrics: Dict[str, Any], step: Optional[int] = None, **kwargs):
|
28 |
+
"""Log metrics to trackio (TRL interface)"""
|
29 |
+
|
30 |
+
def finish():
|
31 |
+
"""Finish trackio experiment (TRL interface)"""
|
32 |
+
```
|
33 |
+
|
34 |
+
### 2. Global Trackio Module (`trackio.py`)
|
35 |
+
|
36 |
+
Created a root-level `trackio.py` file that imports from our custom implementation:
|
37 |
+
|
38 |
+
```python
|
39 |
+
from src.trackio import (
|
40 |
+
init, log, finish, log_config, log_checkpoint,
|
41 |
+
log_evaluation_results, get_experiment_url, is_available, get_monitor
|
42 |
+
)
|
43 |
+
```
|
44 |
+
|
45 |
+
This makes the trackio module available globally for TRL to import.
|
46 |
+
|
47 |
+
### 3. Updated Trainer Integration (`src/trainer.py`)
|
48 |
+
|
49 |
+
Modified the trainer to properly initialize trackio before creating SFTTrainer:
|
50 |
+
|
51 |
+
```python
|
52 |
+
# Initialize trackio for TRL compatibility
|
53 |
+
try:
|
54 |
+
import trackio
|
55 |
+
experiment_id = trackio.init(
|
56 |
+
project_name=self.config.experiment_name,
|
57 |
+
experiment_name=self.config.experiment_name,
|
58 |
+
trackio_url=getattr(self.config, 'trackio_url', None),
|
59 |
+
trackio_token=getattr(self.config, 'trackio_token', None),
|
60 |
+
hf_token=getattr(self.config, 'hf_token', None),
|
61 |
+
dataset_repo=getattr(self.config, 'dataset_repo', None)
|
62 |
+
)
|
63 |
+
logger.info(f"Trackio initialized with experiment ID: {experiment_id}")
|
64 |
+
except Exception as e:
|
65 |
+
logger.warning(f"Failed to initialize trackio: {e}")
|
66 |
+
logger.info("Continuing without trackio integration")
|
67 |
+
```
|
68 |
+
|
69 |
+
### 4. Proper Cleanup
|
70 |
+
|
71 |
+
Added trackio.finish() calls in both success and error scenarios:
|
72 |
+
|
73 |
+
```python
|
74 |
+
# Finish trackio experiment
|
75 |
+
try:
|
76 |
+
import trackio
|
77 |
+
trackio.finish()
|
78 |
+
logger.info("Trackio experiment finished")
|
79 |
+
except Exception as e:
|
80 |
+
logger.warning(f"Failed to finish trackio experiment: {e}")
|
81 |
+
```
|
82 |
+
|
83 |
+
## Integration with Custom Monitoring
|
84 |
+
|
85 |
+
The trackio module integrates seamlessly with our existing monitoring system:
|
86 |
+
|
87 |
+
- Uses `SmolLM3Monitor` for actual monitoring functionality
|
88 |
+
- Provides TRL-compatible interface on top
|
89 |
+
- Maintains all existing features (HF Datasets, Trackio Space, etc.)
|
90 |
+
- Graceful fallback when Trackio Space is not accessible
|
91 |
+
|
92 |
+
## Testing
|
93 |
+
|
94 |
+
Created comprehensive test suite (`tests/test_trackio_trl_fix.py`) that verifies:
|
95 |
+
|
96 |
+
1. **Interface Compatibility**: All required functions exist
|
97 |
+
2. **TRL Compatibility**: Function signatures match expectations
|
98 |
+
3. **Monitoring Integration**: Works with our custom monitoring system
|
99 |
+
|
100 |
+
Test results:
|
101 |
+
```
|
102 |
+
β
Successfully imported trackio module
|
103 |
+
β
Found required function: init
|
104 |
+
β
Found required function: log
|
105 |
+
β
Found required function: finish
|
106 |
+
β
Trackio initialization successful
|
107 |
+
β
Trackio logging successful
|
108 |
+
β
Trackio finish successful
|
109 |
+
β
TRL compatibility test passed
|
110 |
+
β
Monitor integration working
|
111 |
+
```
|
112 |
+
|
113 |
+
## Benefits
|
114 |
+
|
115 |
+
1. **Resolves Training Error**: Fixes the "module trackio has no attribute init" error
|
116 |
+
2. **Maintains Functionality**: All existing monitoring features continue to work
|
117 |
+
3. **TRL Compatibility**: SFTTrainer can now use trackio for logging
|
118 |
+
4. **Graceful Fallback**: Continues training even if trackio initialization fails
|
119 |
+
5. **Future-Proof**: Easy to extend with additional TRL-compatible functions
|
120 |
+
|
121 |
+
## Usage
|
122 |
+
|
123 |
+
The fix is transparent to users. Training will now work with SFTTrainer and automatically:
|
124 |
+
|
125 |
+
1. Initialize trackio when SFTTrainer is created
|
126 |
+
2. Log metrics during training
|
127 |
+
3. Finish the experiment when training completes
|
128 |
+
4. Fall back gracefully if trackio is not available
|
129 |
+
|
130 |
+
## Files Modified
|
131 |
+
|
132 |
+
- `src/trackio.py` - New trackio module interface
|
133 |
+
- `trackio.py` - Global trackio module for TRL
|
134 |
+
- `src/trainer.py` - Updated trainer integration
|
135 |
+
- `src/__init__.py` - Package exports
|
136 |
+
- `tests/test_trackio_trl_fix.py` - Test suite
|
137 |
+
|
138 |
+
## Verification
|
139 |
+
|
140 |
+
To verify the fix works:
|
141 |
+
|
142 |
+
```bash
|
143 |
+
python tests/test_trackio_trl_fix.py
|
144 |
+
```
|
145 |
+
|
146 |
+
This should show all tests passing and confirm that the trackio module provides the interface expected by TRL library.
|
launch.sh
CHANGED
@@ -493,6 +493,7 @@ echo " Epochs: $MAX_EPOCHS"
|
|
493 |
echo " Batch Size: $BATCH_SIZE"
|
494 |
echo " Learning Rate: $LEARNING_RATE"
|
495 |
echo " Model Repo: $REPO_NAME (auto-generated)"
|
|
|
496 |
echo " Trackio Space: $TRACKIO_URL"
|
497 |
echo " HF Dataset: $TRACKIO_DATASET_REPO"
|
498 |
echo ""
|
@@ -609,6 +610,16 @@ else
|
|
609 |
exit 1
|
610 |
fi
|
611 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
612 |
# Step 9: Deploy Trackio Space (automated)
|
613 |
print_step "Step 9: Deploying Trackio Space"
|
614 |
echo "==================================="
|
@@ -729,7 +740,9 @@ python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME
|
|
729 |
--token "$HF_TOKEN" \
|
730 |
--trackio-url "$TRACKIO_URL" \
|
731 |
--experiment-name "$EXPERIMENT_NAME" \
|
732 |
-
--dataset-repo "$TRACKIO_DATASET_REPO"
|
|
|
|
|
733 |
|
734 |
# Step 16.5: Quantization Options
|
735 |
print_step "Step 16.5: Model Quantization Options"
|
|
|
493 |
echo " Batch Size: $BATCH_SIZE"
|
494 |
echo " Learning Rate: $LEARNING_RATE"
|
495 |
echo " Model Repo: $REPO_NAME (auto-generated)"
|
496 |
+
echo " Author: $AUTHOR_NAME"
|
497 |
echo " Trackio Space: $TRACKIO_URL"
|
498 |
echo " HF Dataset: $TRACKIO_DATASET_REPO"
|
499 |
echo ""
|
|
|
610 |
exit 1
|
611 |
fi
|
612 |
|
613 |
+
# Step 8.2: Author Information for Model Card
|
614 |
+
print_step "Step 8.2: Author Information"
|
615 |
+
echo "================================="
|
616 |
+
|
617 |
+
print_info "This information will be used in the model card and citation."
|
618 |
+
get_input "Author name for model card" "$HF_USERNAME" AUTHOR_NAME
|
619 |
+
|
620 |
+
print_info "Model description will be used in the model card and repository."
|
621 |
+
get_input "Model description" "A fine-tuned version of SmolLM3-3B for improved french language text generation and conversation capabilities." MODEL_DESCRIPTION
|
622 |
+
|
623 |
# Step 9: Deploy Trackio Space (automated)
|
624 |
print_step "Step 9: Deploying Trackio Space"
|
625 |
echo "==================================="
|
|
|
740 |
--token "$HF_TOKEN" \
|
741 |
--trackio-url "$TRACKIO_URL" \
|
742 |
--experiment-name "$EXPERIMENT_NAME" \
|
743 |
+
--dataset-repo "$TRACKIO_DATASET_REPO" \
|
744 |
+
--author-name "$AUTHOR_NAME" \
|
745 |
+
--model-description "$MODEL_DESCRIPTION"
|
746 |
|
747 |
# Step 16.5: Quantization Options
|
748 |
print_step "Step 16.5: Model Quantization Options"
|
scripts/model_tonic/push_to_huggingface.py
CHANGED
@@ -46,7 +46,9 @@ class HuggingFacePusher:
|
|
46 |
trackio_url: Optional[str] = None,
|
47 |
experiment_name: Optional[str] = None,
|
48 |
dataset_repo: Optional[str] = None,
|
49 |
-
hf_token: Optional[str] = None
|
|
|
|
|
50 |
):
|
51 |
self.model_path = Path(model_path)
|
52 |
self.repo_name = repo_name
|
@@ -54,6 +56,8 @@ class HuggingFacePusher:
|
|
54 |
self.private = private
|
55 |
self.trackio_url = trackio_url
|
56 |
self.experiment_name = experiment_name
|
|
|
|
|
57 |
|
58 |
# HF Datasets configuration
|
59 |
self.dataset_repo = dataset_repo or os.getenv('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
|
@@ -131,7 +135,7 @@ class HuggingFacePusher:
|
|
131 |
# Create variables for the template
|
132 |
variables = {
|
133 |
"model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
|
134 |
-
"model_description": "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
|
135 |
"repo_name": self.repo_name,
|
136 |
"base_model": "HuggingFaceTB/SmolLM3-3B",
|
137 |
"dataset_name": training_config.get('dataset_name', 'OpenHermes-FR'),
|
@@ -148,7 +152,7 @@ class HuggingFacePusher:
|
|
148 |
"dataset_repo": self.dataset_repo,
|
149 |
"dataset_size": training_config.get('dataset_size', '~80K samples'),
|
150 |
"dataset_format": training_config.get('dataset_format', 'Chat format'),
|
151 |
-
"author_name": training_config.get('author_name', 'Your Name'),
|
152 |
"model_name_slug": self.repo_name.split('/')[-1].lower().replace('-', '_'),
|
153 |
"quantized_models": False, # Will be updated if quantized models are added
|
154 |
"dataset_sample_size": training_config.get('dataset_sample_size'),
|
@@ -522,6 +526,8 @@ def parse_args():
|
|
522 |
parser.add_argument('--trackio-url', type=str, default=None, help='Trackio Space URL for logging')
|
523 |
parser.add_argument('--experiment-name', type=str, default=None, help='Experiment name for Trackio')
|
524 |
parser.add_argument('--dataset-repo', type=str, default=None, help='HF Dataset repository for experiment storage')
|
|
|
|
|
525 |
|
526 |
return parser.parse_args()
|
527 |
|
@@ -547,7 +553,9 @@ def main():
|
|
547 |
trackio_url=args.trackio_url,
|
548 |
experiment_name=args.experiment_name,
|
549 |
dataset_repo=args.dataset_repo,
|
550 |
-
hf_token=args.hf_token
|
|
|
|
|
551 |
)
|
552 |
|
553 |
# Push model
|
|
|
46 |
trackio_url: Optional[str] = None,
|
47 |
experiment_name: Optional[str] = None,
|
48 |
dataset_repo: Optional[str] = None,
|
49 |
+
hf_token: Optional[str] = None,
|
50 |
+
author_name: Optional[str] = None,
|
51 |
+
model_description: Optional[str] = None
|
52 |
):
|
53 |
self.model_path = Path(model_path)
|
54 |
self.repo_name = repo_name
|
|
|
56 |
self.private = private
|
57 |
self.trackio_url = trackio_url
|
58 |
self.experiment_name = experiment_name
|
59 |
+
self.author_name = author_name
|
60 |
+
self.model_description = model_description
|
61 |
|
62 |
# HF Datasets configuration
|
63 |
self.dataset_repo = dataset_repo or os.getenv('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
|
|
|
135 |
# Create variables for the template
|
136 |
variables = {
|
137 |
"model_name": f"{self.repo_name.split('/')[-1]} - Fine-tuned SmolLM3",
|
138 |
+
"model_description": self.model_description or "A fine-tuned version of SmolLM3-3B for improved text generation and conversation capabilities.",
|
139 |
"repo_name": self.repo_name,
|
140 |
"base_model": "HuggingFaceTB/SmolLM3-3B",
|
141 |
"dataset_name": training_config.get('dataset_name', 'OpenHermes-FR'),
|
|
|
152 |
"dataset_repo": self.dataset_repo,
|
153 |
"dataset_size": training_config.get('dataset_size', '~80K samples'),
|
154 |
"dataset_format": training_config.get('dataset_format', 'Chat format'),
|
155 |
+
"author_name": self.author_name or training_config.get('author_name', 'Your Name'),
|
156 |
"model_name_slug": self.repo_name.split('/')[-1].lower().replace('-', '_'),
|
157 |
"quantized_models": False, # Will be updated if quantized models are added
|
158 |
"dataset_sample_size": training_config.get('dataset_sample_size'),
|
|
|
526 |
parser.add_argument('--trackio-url', type=str, default=None, help='Trackio Space URL for logging')
|
527 |
parser.add_argument('--experiment-name', type=str, default=None, help='Experiment name for Trackio')
|
528 |
parser.add_argument('--dataset-repo', type=str, default=None, help='HF Dataset repository for experiment storage')
|
529 |
+
parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
|
530 |
+
parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
|
531 |
|
532 |
return parser.parse_args()
|
533 |
|
|
|
553 |
trackio_url=args.trackio_url,
|
554 |
experiment_name=args.experiment_name,
|
555 |
dataset_repo=args.dataset_repo,
|
556 |
+
hf_token=args.hf_token,
|
557 |
+
author_name=args.author_name,
|
558 |
+
model_description=args.model_description
|
559 |
)
|
560 |
|
561 |
# Push model
|
src/__init__.py
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
SmolLM3 Fine-tuning Pipeline
|
3 |
+
Core training and monitoring modules
|
4 |
+
"""
|
5 |
+
|
6 |
+
from .config import SmolLM3Config
|
7 |
+
from .data import SmolLM3Dataset
|
8 |
+
from .model import SmolLM3Model
|
9 |
+
from .monitoring import SmolLM3Monitor, create_monitor_from_config
|
10 |
+
from .train import SmolLM3Trainer
|
11 |
+
from .trainer import SmolLM3Trainer as Trainer
|
12 |
+
from .trackio import init, log, finish, log_config, log_checkpoint, log_evaluation_results
|
13 |
+
|
14 |
+
__all__ = [
|
15 |
+
'SmolLM3Config',
|
16 |
+
'SmolLM3Dataset',
|
17 |
+
'SmolLM3Model',
|
18 |
+
'SmolLM3Monitor',
|
19 |
+
'create_monitor_from_config',
|
20 |
+
'SmolLM3Trainer',
|
21 |
+
'Trainer',
|
22 |
+
# Trackio interface
|
23 |
+
'init',
|
24 |
+
'log',
|
25 |
+
'finish',
|
26 |
+
'log_config',
|
27 |
+
'log_checkpoint',
|
28 |
+
'log_evaluation_results'
|
29 |
+
]
|
src/trackio.py
ADDED
@@ -0,0 +1,199 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Trackio Module Interface for TRL Library
|
3 |
+
Provides the interface expected by TRL library while integrating with our custom monitoring system
|
4 |
+
"""
|
5 |
+
|
6 |
+
import os
|
7 |
+
import logging
|
8 |
+
from typing import Dict, Any, Optional
|
9 |
+
from datetime import datetime
|
10 |
+
|
11 |
+
# Import our custom monitoring
|
12 |
+
from monitoring import SmolLM3Monitor
|
13 |
+
|
14 |
+
logger = logging.getLogger(__name__)
|
15 |
+
|
16 |
+
# Global monitor instance
|
17 |
+
_monitor = None
|
18 |
+
|
19 |
+
def init(
|
20 |
+
project_name: str,
|
21 |
+
experiment_name: Optional[str] = None,
|
22 |
+
**kwargs
|
23 |
+
) -> str:
|
24 |
+
"""
|
25 |
+
Initialize trackio experiment (TRL interface)
|
26 |
+
|
27 |
+
Args:
|
28 |
+
project_name: Name of the project
|
29 |
+
experiment_name: Name of the experiment (optional)
|
30 |
+
**kwargs: Additional configuration parameters
|
31 |
+
|
32 |
+
Returns:
|
33 |
+
Experiment ID
|
34 |
+
"""
|
35 |
+
global _monitor
|
36 |
+
|
37 |
+
try:
|
38 |
+
# Extract configuration from kwargs
|
39 |
+
trackio_url = kwargs.get('trackio_url') or os.environ.get('TRACKIO_URL')
|
40 |
+
trackio_token = kwargs.get('trackio_token') or os.environ.get('TRACKIO_TOKEN')
|
41 |
+
hf_token = kwargs.get('hf_token') or os.environ.get('HF_TOKEN')
|
42 |
+
dataset_repo = kwargs.get('dataset_repo') or os.environ.get('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
|
43 |
+
|
44 |
+
# Use experiment_name if provided, otherwise use project_name
|
45 |
+
exp_name = experiment_name or project_name
|
46 |
+
|
47 |
+
# Create monitor instance
|
48 |
+
_monitor = SmolLM3Monitor(
|
49 |
+
experiment_name=exp_name,
|
50 |
+
trackio_url=trackio_url,
|
51 |
+
trackio_token=trackio_token,
|
52 |
+
enable_tracking=True,
|
53 |
+
log_artifacts=True,
|
54 |
+
log_metrics=True,
|
55 |
+
log_config=True,
|
56 |
+
hf_token=hf_token,
|
57 |
+
dataset_repo=dataset_repo
|
58 |
+
)
|
59 |
+
|
60 |
+
# Generate experiment ID
|
61 |
+
experiment_id = f"trl_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
62 |
+
_monitor.experiment_id = experiment_id
|
63 |
+
|
64 |
+
logger.info(f"Trackio initialized for experiment: {exp_name}")
|
65 |
+
logger.info(f"Experiment ID: {experiment_id}")
|
66 |
+
|
67 |
+
return experiment_id
|
68 |
+
|
69 |
+
except Exception as e:
|
70 |
+
logger.error(f"Failed to initialize trackio: {e}")
|
71 |
+
# Return a fallback experiment ID
|
72 |
+
return f"trl_fallback_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
73 |
+
|
74 |
+
def log(
|
75 |
+
metrics: Dict[str, Any],
|
76 |
+
step: Optional[int] = None,
|
77 |
+
**kwargs
|
78 |
+
):
|
79 |
+
"""
|
80 |
+
Log metrics to trackio (TRL interface)
|
81 |
+
|
82 |
+
Args:
|
83 |
+
metrics: Dictionary of metrics to log
|
84 |
+
step: Current training step
|
85 |
+
**kwargs: Additional parameters
|
86 |
+
"""
|
87 |
+
global _monitor
|
88 |
+
|
89 |
+
try:
|
90 |
+
if _monitor is None:
|
91 |
+
logger.warning("Trackio not initialized, skipping log")
|
92 |
+
return
|
93 |
+
|
94 |
+
# Log metrics using our custom monitor
|
95 |
+
_monitor.log_metrics(metrics, step)
|
96 |
+
|
97 |
+
# Also log system metrics if available
|
98 |
+
_monitor.log_system_metrics(step)
|
99 |
+
|
100 |
+
except Exception as e:
|
101 |
+
logger.error(f"Failed to log metrics: {e}")
|
102 |
+
|
103 |
+
def finish():
|
104 |
+
"""
|
105 |
+
Finish trackio experiment (TRL interface)
|
106 |
+
"""
|
107 |
+
global _monitor
|
108 |
+
|
109 |
+
try:
|
110 |
+
if _monitor is None:
|
111 |
+
logger.warning("Trackio not initialized, skipping finish")
|
112 |
+
return
|
113 |
+
|
114 |
+
# Close the monitoring session
|
115 |
+
_monitor.close()
|
116 |
+
|
117 |
+
logger.info("Trackio experiment finished")
|
118 |
+
|
119 |
+
except Exception as e:
|
120 |
+
logger.error(f"Failed to finish trackio experiment: {e}")
|
121 |
+
|
122 |
+
def log_config(config: Dict[str, Any]):
|
123 |
+
"""
|
124 |
+
Log configuration to trackio (TRL interface)
|
125 |
+
|
126 |
+
Args:
|
127 |
+
config: Configuration dictionary to log
|
128 |
+
"""
|
129 |
+
global _monitor
|
130 |
+
|
131 |
+
try:
|
132 |
+
if _monitor is None:
|
133 |
+
logger.warning("Trackio not initialized, skipping config log")
|
134 |
+
return
|
135 |
+
|
136 |
+
# Log configuration using our custom monitor
|
137 |
+
_monitor.log_configuration(config)
|
138 |
+
|
139 |
+
except Exception as e:
|
140 |
+
logger.error(f"Failed to log config: {e}")
|
141 |
+
|
142 |
+
def log_checkpoint(checkpoint_path: str, step: Optional[int] = None):
|
143 |
+
"""
|
144 |
+
Log checkpoint to trackio (TRL interface)
|
145 |
+
|
146 |
+
Args:
|
147 |
+
checkpoint_path: Path to the checkpoint file
|
148 |
+
step: Current training step
|
149 |
+
"""
|
150 |
+
global _monitor
|
151 |
+
|
152 |
+
try:
|
153 |
+
if _monitor is None:
|
154 |
+
logger.warning("Trackio not initialized, skipping checkpoint log")
|
155 |
+
return
|
156 |
+
|
157 |
+
# Log checkpoint using our custom monitor
|
158 |
+
_monitor.log_model_checkpoint(checkpoint_path, step)
|
159 |
+
|
160 |
+
except Exception as e:
|
161 |
+
logger.error(f"Failed to log checkpoint: {e}")
|
162 |
+
|
163 |
+
def log_evaluation_results(results: Dict[str, Any], step: Optional[int] = None):
|
164 |
+
"""
|
165 |
+
Log evaluation results to trackio (TRL interface)
|
166 |
+
|
167 |
+
Args:
|
168 |
+
results: Evaluation results dictionary
|
169 |
+
step: Current training step
|
170 |
+
"""
|
171 |
+
global _monitor
|
172 |
+
|
173 |
+
try:
|
174 |
+
if _monitor is None:
|
175 |
+
logger.warning("Trackio not initialized, skipping evaluation log")
|
176 |
+
return
|
177 |
+
|
178 |
+
# Log evaluation results using our custom monitor
|
179 |
+
_monitor.log_evaluation_results(results, step)
|
180 |
+
|
181 |
+
except Exception as e:
|
182 |
+
logger.error(f"Failed to log evaluation results: {e}")
|
183 |
+
|
184 |
+
# Additional utility functions for TRL compatibility
|
185 |
+
def get_experiment_url() -> Optional[str]:
|
186 |
+
"""Get the URL to view the experiment"""
|
187 |
+
global _monitor
|
188 |
+
|
189 |
+
if _monitor is not None:
|
190 |
+
return _monitor.get_experiment_url()
|
191 |
+
return None
|
192 |
+
|
193 |
+
def is_available() -> bool:
|
194 |
+
"""Check if trackio is available and initialized"""
|
195 |
+
return _monitor is not None and _monitor.enable_tracking
|
196 |
+
|
197 |
+
def get_monitor():
|
198 |
+
"""Get the current monitor instance (for advanced usage)"""
|
199 |
+
return _monitor
|
src/trainer.py
CHANGED
@@ -135,6 +135,23 @@ class SmolLM3Trainer:
|
|
135 |
|
136 |
logger.info("Total callbacks: %d", len(callbacks))
|
137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
138 |
# Try SFTTrainer first (better for instruction tuning)
|
139 |
logger.info("Creating SFTTrainer with training arguments...")
|
140 |
logger.info("Training args type: %s", type(training_args))
|
@@ -235,6 +252,14 @@ class SmolLM3Trainer:
|
|
235 |
self.monitor.log_training_summary(summary)
|
236 |
self.monitor.close()
|
237 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
238 |
logger.info("Training completed successfully!")
|
239 |
logger.info("Training metrics: %s", train_result.metrics)
|
240 |
|
@@ -243,6 +268,14 @@ class SmolLM3Trainer:
|
|
243 |
# Close monitoring on error
|
244 |
if self.monitor and self.monitor.enable_tracking:
|
245 |
self.monitor.close()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
246 |
raise
|
247 |
|
248 |
def evaluate(self):
|
|
|
135 |
|
136 |
logger.info("Total callbacks: %d", len(callbacks))
|
137 |
|
138 |
+
# Initialize trackio for TRL compatibility
|
139 |
+
try:
|
140 |
+
import trackio
|
141 |
+
# Initialize trackio with our configuration
|
142 |
+
experiment_id = trackio.init(
|
143 |
+
project_name=self.config.experiment_name,
|
144 |
+
experiment_name=self.config.experiment_name,
|
145 |
+
trackio_url=getattr(self.config, 'trackio_url', None),
|
146 |
+
trackio_token=getattr(self.config, 'trackio_token', None),
|
147 |
+
hf_token=getattr(self.config, 'hf_token', None),
|
148 |
+
dataset_repo=getattr(self.config, 'dataset_repo', None)
|
149 |
+
)
|
150 |
+
logger.info(f"Trackio initialized with experiment ID: {experiment_id}")
|
151 |
+
except Exception as e:
|
152 |
+
logger.warning(f"Failed to initialize trackio: {e}")
|
153 |
+
logger.info("Continuing without trackio integration")
|
154 |
+
|
155 |
# Try SFTTrainer first (better for instruction tuning)
|
156 |
logger.info("Creating SFTTrainer with training arguments...")
|
157 |
logger.info("Training args type: %s", type(training_args))
|
|
|
252 |
self.monitor.log_training_summary(summary)
|
253 |
self.monitor.close()
|
254 |
|
255 |
+
# Finish trackio experiment
|
256 |
+
try:
|
257 |
+
import trackio
|
258 |
+
trackio.finish()
|
259 |
+
logger.info("Trackio experiment finished")
|
260 |
+
except Exception as e:
|
261 |
+
logger.warning(f"Failed to finish trackio experiment: {e}")
|
262 |
+
|
263 |
logger.info("Training completed successfully!")
|
264 |
logger.info("Training metrics: %s", train_result.metrics)
|
265 |
|
|
|
268 |
# Close monitoring on error
|
269 |
if self.monitor and self.monitor.enable_tracking:
|
270 |
self.monitor.close()
|
271 |
+
|
272 |
+
# Finish trackio experiment on error
|
273 |
+
try:
|
274 |
+
import trackio
|
275 |
+
trackio.finish()
|
276 |
+
except Exception as finish_error:
|
277 |
+
logger.warning(f"Failed to finish trackio experiment on error: {finish_error}")
|
278 |
+
|
279 |
raise
|
280 |
|
281 |
def evaluate(self):
|
setup_launch.py β tests/setup_launch.py
RENAMED
File without changes
|
tests/test_trackio_trl_fix.py
ADDED
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Test script to verify Trackio TRL compatibility fix
|
4 |
+
Tests that our trackio module provides the interface expected by TRL library
|
5 |
+
"""
|
6 |
+
|
7 |
+
import sys
|
8 |
+
import os
|
9 |
+
import logging
|
10 |
+
|
11 |
+
# Add src to path
|
12 |
+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
|
13 |
+
|
14 |
+
def test_trackio_interface():
|
15 |
+
"""Test that trackio module provides the expected interface"""
|
16 |
+
print("π Testing Trackio TRL Interface")
|
17 |
+
|
18 |
+
try:
|
19 |
+
# Test importing trackio
|
20 |
+
import trackio
|
21 |
+
print("β
Successfully imported trackio module")
|
22 |
+
|
23 |
+
# Test that required functions exist
|
24 |
+
required_functions = ['init', 'log', 'finish']
|
25 |
+
for func_name in required_functions:
|
26 |
+
if hasattr(trackio, func_name):
|
27 |
+
print(f"β
Found required function: {func_name}")
|
28 |
+
else:
|
29 |
+
print(f"β Missing required function: {func_name}")
|
30 |
+
return False
|
31 |
+
|
32 |
+
# Test initialization
|
33 |
+
experiment_id = trackio.init(
|
34 |
+
project_name="test_project",
|
35 |
+
experiment_name="test_experiment",
|
36 |
+
trackio_url="https://test.hf.space",
|
37 |
+
dataset_repo="test/trackio-experiments"
|
38 |
+
)
|
39 |
+
print(f"β
Trackio initialization successful: {experiment_id}")
|
40 |
+
|
41 |
+
# Test logging
|
42 |
+
metrics = {'loss': 0.5, 'learning_rate': 1e-4}
|
43 |
+
trackio.log(metrics, step=1)
|
44 |
+
print("β
Trackio logging successful")
|
45 |
+
|
46 |
+
# Test finishing
|
47 |
+
trackio.finish()
|
48 |
+
print("β
Trackio finish successful")
|
49 |
+
|
50 |
+
return True
|
51 |
+
|
52 |
+
except Exception as e:
|
53 |
+
print(f"β Trackio interface test failed: {e}")
|
54 |
+
return False
|
55 |
+
|
56 |
+
def test_trl_compatibility():
|
57 |
+
"""Test that our trackio module is compatible with TRL expectations"""
|
58 |
+
print("\nπ Testing TRL Compatibility")
|
59 |
+
|
60 |
+
try:
|
61 |
+
# Simulate what TRL would do
|
62 |
+
import trackio
|
63 |
+
|
64 |
+
# TRL expects these functions to be available
|
65 |
+
assert hasattr(trackio, 'init'), "trackio.init not found"
|
66 |
+
assert hasattr(trackio, 'log'), "trackio.log not found"
|
67 |
+
assert hasattr(trackio, 'finish'), "trackio.finish not found"
|
68 |
+
|
69 |
+
# Test function signatures
|
70 |
+
import inspect
|
71 |
+
|
72 |
+
# Check init signature
|
73 |
+
init_sig = inspect.signature(trackio.init)
|
74 |
+
print(f"β
init signature: {init_sig}")
|
75 |
+
|
76 |
+
# Check log signature
|
77 |
+
log_sig = inspect.signature(trackio.log)
|
78 |
+
print(f"β
log signature: {log_sig}")
|
79 |
+
|
80 |
+
# Check finish signature
|
81 |
+
finish_sig = inspect.signature(trackio.finish)
|
82 |
+
print(f"β
finish signature: {finish_sig}")
|
83 |
+
|
84 |
+
print("β
TRL compatibility test passed")
|
85 |
+
return True
|
86 |
+
|
87 |
+
except Exception as e:
|
88 |
+
print(f"β TRL compatibility test failed: {e}")
|
89 |
+
return False
|
90 |
+
|
91 |
+
def test_monitoring_integration():
|
92 |
+
"""Test that our trackio module integrates with our monitoring system"""
|
93 |
+
print("\nπ Testing Monitoring Integration")
|
94 |
+
|
95 |
+
try:
|
96 |
+
import trackio
|
97 |
+
|
98 |
+
# Test that we can get the monitor
|
99 |
+
monitor = trackio.get_monitor()
|
100 |
+
if monitor is not None:
|
101 |
+
print("β
Monitor integration working")
|
102 |
+
else:
|
103 |
+
print("β οΈ Monitor not available (this is normal if not initialized)")
|
104 |
+
|
105 |
+
# Test availability check
|
106 |
+
is_avail = trackio.is_available()
|
107 |
+
print(f"β
Trackio availability check: {is_avail}")
|
108 |
+
|
109 |
+
return True
|
110 |
+
|
111 |
+
except Exception as e:
|
112 |
+
print(f"β Monitoring integration test failed: {e}")
|
113 |
+
return False
|
114 |
+
|
115 |
+
def main():
|
116 |
+
"""Run all tests"""
|
117 |
+
print("π Testing Trackio TRL Fix")
|
118 |
+
print("=" * 50)
|
119 |
+
|
120 |
+
tests = [
|
121 |
+
test_trackio_interface,
|
122 |
+
test_trl_compatibility,
|
123 |
+
test_monitoring_integration
|
124 |
+
]
|
125 |
+
|
126 |
+
passed = 0
|
127 |
+
total = len(tests)
|
128 |
+
|
129 |
+
for test in tests:
|
130 |
+
try:
|
131 |
+
if test():
|
132 |
+
passed += 1
|
133 |
+
except Exception as e:
|
134 |
+
print(f"β Test {test.__name__} failed with exception: {e}")
|
135 |
+
|
136 |
+
print("\n" + "=" * 50)
|
137 |
+
print(f"Test Results: {passed}/{total} tests passed")
|
138 |
+
|
139 |
+
if passed == total:
|
140 |
+
print("β
All tests passed! Trackio TRL fix is working correctly.")
|
141 |
+
print("\nThe trackio module now provides the interface expected by TRL library:")
|
142 |
+
print("- init(): Initialize experiment")
|
143 |
+
print("- log(): Log metrics")
|
144 |
+
print("- finish(): Finish experiment")
|
145 |
+
print("\nThis should resolve the 'module trackio has no attribute init' error.")
|
146 |
+
else:
|
147 |
+
print("β Some tests failed. Please check the implementation.")
|
148 |
+
return 1
|
149 |
+
|
150 |
+
return 0
|
151 |
+
|
152 |
+
if __name__ == "__main__":
|
153 |
+
sys.exit(main())
|
trackio.py
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Trackio Module for TRL Library Compatibility
|
3 |
+
This module provides the interface expected by TRL library while using our custom monitoring system
|
4 |
+
"""
|
5 |
+
|
6 |
+
# Import all functions from our custom trackio implementation
|
7 |
+
from src.trackio import (
|
8 |
+
init,
|
9 |
+
log,
|
10 |
+
finish,
|
11 |
+
log_config,
|
12 |
+
log_checkpoint,
|
13 |
+
log_evaluation_results,
|
14 |
+
get_experiment_url,
|
15 |
+
is_available,
|
16 |
+
get_monitor
|
17 |
+
)
|
18 |
+
|
19 |
+
# Make all functions available at module level
|
20 |
+
__all__ = [
|
21 |
+
'init',
|
22 |
+
'log',
|
23 |
+
'finish',
|
24 |
+
'log_config',
|
25 |
+
'log_checkpoint',
|
26 |
+
'log_evaluation_results',
|
27 |
+
'get_experiment_url',
|
28 |
+
'is_available',
|
29 |
+
'get_monitor'
|
30 |
+
]
|