Spaces:
Running
Running
adds default values to experiment name
Browse files- docs/TRACKIO_TRL_FIX.md +9 -4
- docs/TRACKIO_TRL_FIX_SUMMARY.md +135 -0
- src/trackio.py +6 -2
- src/trainer.py +2 -2
- tests/test_trackio_trl_fix.py +15 -2
docs/TRACKIO_TRL_FIX.md
CHANGED
@@ -21,7 +21,7 @@ However, our custom monitoring implementation didn't provide this interface.
|
|
21 |
Created a trackio module that provides the exact interface expected by TRL:
|
22 |
|
23 |
```python
|
24 |
-
def init(project_name: str, experiment_name: Optional[str] = None, **kwargs) -> str:
|
25 |
"""Initialize trackio experiment (TRL interface)"""
|
26 |
|
27 |
def log(metrics: Dict[str, Any], step: Optional[int] = None, **kwargs):
|
@@ -31,6 +31,8 @@ def finish():
|
|
31 |
"""Finish trackio experiment (TRL interface)"""
|
32 |
```
|
33 |
|
|
|
|
|
34 |
### 2. Global Trackio Module (`trackio.py`)
|
35 |
|
36 |
Created a root-level `trackio.py` file that imports from our custom implementation:
|
@@ -103,20 +105,23 @@ Test results:
|
|
103 |
β
Found required function: init
|
104 |
β
Found required function: log
|
105 |
β
Found required function: finish
|
106 |
-
β
Trackio initialization successful
|
|
|
107 |
β
Trackio logging successful
|
108 |
β
Trackio finish successful
|
|
|
109 |
β
TRL compatibility test passed
|
110 |
β
Monitor integration working
|
111 |
```
|
112 |
|
113 |
## Benefits
|
114 |
|
115 |
-
1. **Resolves Training Error**: Fixes the "module trackio has no attribute init" error
|
116 |
2. **Maintains Functionality**: All existing monitoring features continue to work
|
117 |
-
3. **TRL Compatibility**: SFTTrainer can now use trackio for logging
|
118 |
4. **Graceful Fallback**: Continues training even if trackio initialization fails
|
119 |
5. **Future-Proof**: Easy to extend with additional TRL-compatible functions
|
|
|
120 |
|
121 |
## Usage
|
122 |
|
|
|
21 |
Created a trackio module that provides the exact interface expected by TRL:
|
22 |
|
23 |
```python
|
24 |
+
def init(project_name: Optional[str] = None, experiment_name: Optional[str] = None, **kwargs) -> str:
|
25 |
"""Initialize trackio experiment (TRL interface)"""
|
26 |
|
27 |
def log(metrics: Dict[str, Any], step: Optional[int] = None, **kwargs):
|
|
|
31 |
"""Finish trackio experiment (TRL interface)"""
|
32 |
```
|
33 |
|
34 |
+
**Key Feature**: The `init()` function can be called without any arguments, making it compatible with TRL's expectations. It will use environment variables or defaults when no arguments are provided.
|
35 |
+
|
36 |
### 2. Global Trackio Module (`trackio.py`)
|
37 |
|
38 |
Created a root-level `trackio.py` file that imports from our custom implementation:
|
|
|
105 |
β
Found required function: init
|
106 |
β
Found required function: log
|
107 |
β
Found required function: finish
|
108 |
+
β
Trackio initialization with args successful
|
109 |
+
β
Trackio initialization without args successful
|
110 |
β
Trackio logging successful
|
111 |
β
Trackio finish successful
|
112 |
+
β
init() can be called without arguments
|
113 |
β
TRL compatibility test passed
|
114 |
β
Monitor integration working
|
115 |
```
|
116 |
|
117 |
## Benefits
|
118 |
|
119 |
+
1. **Resolves Training Error**: Fixes the "module trackio has no attribute init" error and "init() missing 1 required positional argument: 'project_name'" error
|
120 |
2. **Maintains Functionality**: All existing monitoring features continue to work
|
121 |
+
3. **TRL Compatibility**: SFTTrainer can now use trackio for logging, even when called without arguments
|
122 |
4. **Graceful Fallback**: Continues training even if trackio initialization fails
|
123 |
5. **Future-Proof**: Easy to extend with additional TRL-compatible functions
|
124 |
+
6. **Flexible Initialization**: Supports both argument-based and environment-based configuration
|
125 |
|
126 |
## Usage
|
127 |
|
docs/TRACKIO_TRL_FIX_SUMMARY.md
ADDED
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Trackio TRL Fix - Complete Solution Summary
|
2 |
+
|
3 |
+
## Problem Resolution
|
4 |
+
|
5 |
+
We successfully resolved two related errors:
|
6 |
+
|
7 |
+
1. **Original Error**: `ERROR:trainer:Training failed: module 'trackio' has no attribute 'init'`
|
8 |
+
2. **Secondary Error**: `ERROR:train:Training failed: init() missing 1 required positional argument: 'project_name'`
|
9 |
+
|
10 |
+
## Root Cause Analysis
|
11 |
+
|
12 |
+
The TRL library (SFTTrainer) expects a `trackio` module with specific functions:
|
13 |
+
- `init()` - Initialize experiment
|
14 |
+
- `log()` - Log metrics
|
15 |
+
- `finish()` - Finish experiment
|
16 |
+
|
17 |
+
However, our custom monitoring implementation didn't provide this interface, and when we created it, the `init()` function required a `project_name` argument, but TRL was calling it without any arguments.
|
18 |
+
|
19 |
+
## Complete Solution
|
20 |
+
|
21 |
+
### 1. Created Trackio Module Interface (`src/trackio.py`)
|
22 |
+
|
23 |
+
```python
|
24 |
+
def init(project_name: Optional[str] = None, experiment_name: Optional[str] = None, **kwargs) -> str:
|
25 |
+
"""Initialize trackio experiment (TRL interface)"""
|
26 |
+
# Provide default project name if not provided
|
27 |
+
if project_name is None:
|
28 |
+
project_name = os.environ.get('EXPERIMENT_NAME', 'smollm3_experiment')
|
29 |
+
# ... rest of implementation
|
30 |
+
```
|
31 |
+
|
32 |
+
**Key Features**:
|
33 |
+
- β
Can be called without arguments (`trackio.init()`)
|
34 |
+
- β
Uses environment variables for defaults
|
35 |
+
- β
Maintains backward compatibility with argument-based calls
|
36 |
+
- β
Integrates with our existing `SmolLM3Monitor` system
|
37 |
+
|
38 |
+
### 2. Global Trackio Module (`trackio.py`)
|
39 |
+
|
40 |
+
Created a root-level module that makes trackio available globally:
|
41 |
+
|
42 |
+
```python
|
43 |
+
from src.trackio import (
|
44 |
+
init, log, finish, log_config, log_checkpoint,
|
45 |
+
log_evaluation_results, get_experiment_url, is_available, get_monitor
|
46 |
+
)
|
47 |
+
```
|
48 |
+
|
49 |
+
### 3. Updated Trainer Integration (`src/trainer.py`)
|
50 |
+
|
51 |
+
Enhanced trainer to properly initialize trackio with fallback handling:
|
52 |
+
|
53 |
+
```python
|
54 |
+
# Initialize trackio for TRL compatibility
|
55 |
+
try:
|
56 |
+
import trackio
|
57 |
+
experiment_id = trackio.init(
|
58 |
+
project_name=getattr(self.config, 'experiment_name', 'smollm3_experiment'),
|
59 |
+
experiment_name=getattr(self.config, 'experiment_name', 'smollm3_experiment'),
|
60 |
+
trackio_url=getattr(self.config, 'trackio_url', None),
|
61 |
+
trackio_token=getattr(self.config, 'trackio_token', None),
|
62 |
+
hf_token=getattr(self.config, 'hf_token', None),
|
63 |
+
dataset_repo=getattr(self.config, 'dataset_repo', None)
|
64 |
+
)
|
65 |
+
logger.info(f"Trackio initialized with experiment ID: {experiment_id}")
|
66 |
+
except Exception as e:
|
67 |
+
logger.warning(f"Failed to initialize trackio: {e}")
|
68 |
+
logger.info("Continuing without trackio integration")
|
69 |
+
```
|
70 |
+
|
71 |
+
### 4. Comprehensive Testing
|
72 |
+
|
73 |
+
Created test suite that verifies:
|
74 |
+
- β
Function availability (`init`, `log`, `finish`)
|
75 |
+
- β
Argument-less calls (`trackio.init()`)
|
76 |
+
- β
Argument-based calls (`trackio.init(project_name="test")`)
|
77 |
+
- β
TRL compatibility
|
78 |
+
- β
Monitoring integration
|
79 |
+
|
80 |
+
## Test Results
|
81 |
+
|
82 |
+
```
|
83 |
+
β
Successfully imported trackio module
|
84 |
+
β
Found required function: init
|
85 |
+
β
Found required function: log
|
86 |
+
β
Found required function: finish
|
87 |
+
β
Trackio initialization with args successful
|
88 |
+
β
Trackio initialization without args successful
|
89 |
+
β
Trackio logging successful
|
90 |
+
β
Trackio finish successful
|
91 |
+
β
init() can be called without arguments
|
92 |
+
β
TRL compatibility test passed
|
93 |
+
β
Monitor integration working
|
94 |
+
```
|
95 |
+
|
96 |
+
## Benefits Achieved
|
97 |
+
|
98 |
+
1. **β
Resolves Both Errors**: Fixes both the missing attribute and missing argument errors
|
99 |
+
2. **β
TRL Compatibility**: SFTTrainer can now use trackio for logging
|
100 |
+
3. **β
Flexible Initialization**: Supports both argument-based and environment-based configuration
|
101 |
+
4. **β
Graceful Fallback**: Continues training even if trackio initialization fails
|
102 |
+
5. **β
Maintains Functionality**: All existing monitoring features continue to work
|
103 |
+
6. **β
Future-Proof**: Easy to extend with additional TRL-compatible functions
|
104 |
+
|
105 |
+
## Files Modified
|
106 |
+
|
107 |
+
- `src/trackio.py` - New trackio module interface with optional arguments
|
108 |
+
- `trackio.py` - Global trackio module for TRL
|
109 |
+
- `src/trainer.py` - Updated trainer integration with robust error handling
|
110 |
+
- `src/__init__.py` - Package exports
|
111 |
+
- `tests/test_trackio_trl_fix.py` - Comprehensive test suite
|
112 |
+
- `docs/TRACKIO_TRL_FIX.md` - Detailed documentation
|
113 |
+
|
114 |
+
## Usage
|
115 |
+
|
116 |
+
The fix is transparent to users. Training will now work with SFTTrainer and automatically:
|
117 |
+
|
118 |
+
1. Initialize trackio when SFTTrainer is created (with or without arguments)
|
119 |
+
2. Log metrics during training
|
120 |
+
3. Finish the experiment when training completes
|
121 |
+
4. Fall back gracefully if trackio is not available
|
122 |
+
|
123 |
+
## Verification
|
124 |
+
|
125 |
+
To verify the fix works:
|
126 |
+
|
127 |
+
```bash
|
128 |
+
python tests/test_trackio_trl_fix.py
|
129 |
+
```
|
130 |
+
|
131 |
+
This should show all tests passing and confirm that the trackio module provides the interface expected by TRL library, including support for argument-less calls.
|
132 |
+
|
133 |
+
## Next Steps
|
134 |
+
|
135 |
+
The training should now proceed successfully without the trackio errors. The SFTTrainer will be able to use our custom monitoring system for logging metrics and experiment tracking, with full compatibility with TRL's expectations.
|
src/trackio.py
CHANGED
@@ -17,7 +17,7 @@ logger = logging.getLogger(__name__)
|
|
17 |
_monitor = None
|
18 |
|
19 |
def init(
|
20 |
-
project_name: str,
|
21 |
experiment_name: Optional[str] = None,
|
22 |
**kwargs
|
23 |
) -> str:
|
@@ -25,7 +25,7 @@ def init(
|
|
25 |
Initialize trackio experiment (TRL interface)
|
26 |
|
27 |
Args:
|
28 |
-
project_name: Name of the project
|
29 |
experiment_name: Name of the experiment (optional)
|
30 |
**kwargs: Additional configuration parameters
|
31 |
|
@@ -35,6 +35,10 @@ def init(
|
|
35 |
global _monitor
|
36 |
|
37 |
try:
|
|
|
|
|
|
|
|
|
38 |
# Extract configuration from kwargs
|
39 |
trackio_url = kwargs.get('trackio_url') or os.environ.get('TRACKIO_URL')
|
40 |
trackio_token = kwargs.get('trackio_token') or os.environ.get('TRACKIO_TOKEN')
|
|
|
17 |
_monitor = None
|
18 |
|
19 |
def init(
|
20 |
+
project_name: Optional[str] = None,
|
21 |
experiment_name: Optional[str] = None,
|
22 |
**kwargs
|
23 |
) -> str:
|
|
|
25 |
Initialize trackio experiment (TRL interface)
|
26 |
|
27 |
Args:
|
28 |
+
project_name: Name of the project (optional, defaults to 'smollm3_experiment')
|
29 |
experiment_name: Name of the experiment (optional)
|
30 |
**kwargs: Additional configuration parameters
|
31 |
|
|
|
35 |
global _monitor
|
36 |
|
37 |
try:
|
38 |
+
# Provide default project name if not provided
|
39 |
+
if project_name is None:
|
40 |
+
project_name = os.environ.get('EXPERIMENT_NAME', 'smollm3_experiment')
|
41 |
+
|
42 |
# Extract configuration from kwargs
|
43 |
trackio_url = kwargs.get('trackio_url') or os.environ.get('TRACKIO_URL')
|
44 |
trackio_token = kwargs.get('trackio_token') or os.environ.get('TRACKIO_TOKEN')
|
src/trainer.py
CHANGED
@@ -140,8 +140,8 @@ class SmolLM3Trainer:
|
|
140 |
import trackio
|
141 |
# Initialize trackio with our configuration
|
142 |
experiment_id = trackio.init(
|
143 |
-
project_name=self.config
|
144 |
-
experiment_name=self.config
|
145 |
trackio_url=getattr(self.config, 'trackio_url', None),
|
146 |
trackio_token=getattr(self.config, 'trackio_token', None),
|
147 |
hf_token=getattr(self.config, 'hf_token', None),
|
|
|
140 |
import trackio
|
141 |
# Initialize trackio with our configuration
|
142 |
experiment_id = trackio.init(
|
143 |
+
project_name=getattr(self.config, 'experiment_name', 'smollm3_experiment'),
|
144 |
+
experiment_name=getattr(self.config, 'experiment_name', 'smollm3_experiment'),
|
145 |
trackio_url=getattr(self.config, 'trackio_url', None),
|
146 |
trackio_token=getattr(self.config, 'trackio_token', None),
|
147 |
hf_token=getattr(self.config, 'hf_token', None),
|
tests/test_trackio_trl_fix.py
CHANGED
@@ -29,14 +29,18 @@ def test_trackio_interface():
|
|
29 |
print(f"β Missing required function: {func_name}")
|
30 |
return False
|
31 |
|
32 |
-
# Test initialization
|
33 |
experiment_id = trackio.init(
|
34 |
project_name="test_project",
|
35 |
experiment_name="test_experiment",
|
36 |
trackio_url="https://test.hf.space",
|
37 |
dataset_repo="test/trackio-experiments"
|
38 |
)
|
39 |
-
print(f"β
Trackio initialization successful: {experiment_id}")
|
|
|
|
|
|
|
|
|
40 |
|
41 |
# Test logging
|
42 |
metrics = {'loss': 0.5, 'learning_rate': 1e-4}
|
@@ -73,6 +77,15 @@ def test_trl_compatibility():
|
|
73 |
init_sig = inspect.signature(trackio.init)
|
74 |
print(f"β
init signature: {init_sig}")
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
# Check log signature
|
77 |
log_sig = inspect.signature(trackio.log)
|
78 |
print(f"β
log signature: {log_sig}")
|
|
|
29 |
print(f"β Missing required function: {func_name}")
|
30 |
return False
|
31 |
|
32 |
+
# Test initialization with arguments
|
33 |
experiment_id = trackio.init(
|
34 |
project_name="test_project",
|
35 |
experiment_name="test_experiment",
|
36 |
trackio_url="https://test.hf.space",
|
37 |
dataset_repo="test/trackio-experiments"
|
38 |
)
|
39 |
+
print(f"β
Trackio initialization with args successful: {experiment_id}")
|
40 |
+
|
41 |
+
# Test initialization without arguments (TRL compatibility)
|
42 |
+
experiment_id2 = trackio.init()
|
43 |
+
print(f"β
Trackio initialization without args successful: {experiment_id2}")
|
44 |
|
45 |
# Test logging
|
46 |
metrics = {'loss': 0.5, 'learning_rate': 1e-4}
|
|
|
77 |
init_sig = inspect.signature(trackio.init)
|
78 |
print(f"β
init signature: {init_sig}")
|
79 |
|
80 |
+
# Test that init can be called without arguments (TRL compatibility)
|
81 |
+
try:
|
82 |
+
# This simulates what TRL might do
|
83 |
+
trackio.init()
|
84 |
+
print("β
init() can be called without arguments")
|
85 |
+
except Exception as e:
|
86 |
+
print(f"β init() failed when called without arguments: {e}")
|
87 |
+
return False
|
88 |
+
|
89 |
# Check log signature
|
90 |
log_sig = inspect.signature(trackio.log)
|
91 |
print(f"β
log signature: {log_sig}")
|