Tonic commited on
Commit
0b9efb9
·
1 Parent(s): 93ed7a1

Initial Space setup

Browse files
scripts/trackio_tonic/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Trackio Tonic
3
+ emoji: 🐠
4
+ colorFrom: indigo
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 5.38.0
8
+ app_file: app.py
9
+ pinned: true
10
+ license: mit
11
+ short_description: trackio for training monitoring
12
+ ---
13
+
14
+ # Trackio Experiment Tracking
15
+
16
+ A Gradio interface for experiment tracking and monitoring.
17
+
18
+ ## Features
19
+
20
+ - Create and manage experiments
21
+ - Log training metrics and parameters
22
+ - View experiment details and results
23
+ - Update experiment status
24
+
25
+ ## Usage
26
+
27
+ 1. Create a new experiment using the "Create Experiment" tab
28
+ 2. Log metrics during training using the "Log Metrics" tab
29
+ 3. View experiment details using the "View Experiments" tab
30
+ 4. Update experiment status using the "Update Status" tab
31
+
32
+ ## Integration
33
+
34
+ To connect your training script to this Trackio Space:
35
+
36
+ ```python
37
+ from monitoring import SmolLM3Monitor
38
+
39
+ monitor = SmolLM3Monitor(
40
+ experiment_name="my_experiment",
41
+ trackio_url="https://huggingface.co/spaces/Tonic/trackio_test_2",
42
+ enable_tracking=True
43
+ )
44
+ ```
45
+
46
+ Visit: https://huggingface.co/spaces/Tonic/trackio_test_2
scripts/trackio_tonic/app.py ADDED
@@ -0,0 +1,1211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Trackio Deployment on Hugging Face Spaces
3
+ A Gradio interface for experiment tracking and monitoring
4
+ """
5
+
6
+ import gradio as gr
7
+ import os
8
+ import json
9
+ import logging
10
+ from datetime import datetime
11
+ from typing import Dict, Any, Optional
12
+ import requests
13
+ import plotly.graph_objects as go
14
+ import plotly.express as px
15
+ import pandas as pd
16
+ import numpy as np
17
+
18
+ # Setup logging
19
+ logging.basicConfig(level=logging.INFO)
20
+ logger = logging.getLogger(__name__)
21
+
22
+ class TrackioSpace:
23
+ """Trackio deployment for Hugging Face Spaces using HF Datasets"""
24
+
25
+ def __init__(self, hf_token: Optional[str] = None, dataset_repo: Optional[str] = None):
26
+ self.experiments = {}
27
+ self.current_experiment = None
28
+
29
+ # Get dataset repository and HF token from parameters or environment variables
30
+ self.dataset_repo = dataset_repo or os.environ.get('TRACKIO_DATASET_REPO', 'tonic/trackio-experiments')
31
+ self.hf_token = hf_token or os.environ.get('HF_TOKEN')
32
+
33
+ logger.info(f"🔧 Using dataset repository: {self.dataset_repo}")
34
+
35
+ if not self.hf_token:
36
+ logger.warning("⚠️ HF_TOKEN not found. Some features may not work.")
37
+
38
+ self._load_experiments()
39
+
40
+ def _load_experiments(self):
41
+ """Load experiments from HF Dataset"""
42
+ try:
43
+ if self.hf_token:
44
+ from datasets import load_dataset
45
+
46
+ # Try to load the dataset
47
+ try:
48
+ dataset = load_dataset(self.dataset_repo, token=self.hf_token)
49
+ logger.info(f"✅ Loaded experiments from {self.dataset_repo}")
50
+
51
+ # Convert dataset to experiments dict
52
+ self.experiments = {}
53
+ if 'train' in dataset:
54
+ for row in dataset['train']:
55
+ exp_id = row.get('experiment_id')
56
+ if exp_id:
57
+ self.experiments[exp_id] = {
58
+ 'id': exp_id,
59
+ 'name': row.get('name', ''),
60
+ 'description': row.get('description', ''),
61
+ 'created_at': row.get('created_at', ''),
62
+ 'status': row.get('status', 'running'),
63
+ 'metrics': json.loads(row.get('metrics', '[]')),
64
+ 'parameters': json.loads(row.get('parameters', '{}')),
65
+ 'artifacts': json.loads(row.get('artifacts', '[]')),
66
+ 'logs': json.loads(row.get('logs', '[]'))
67
+ }
68
+
69
+ logger.info(f"📊 Loaded {len(self.experiments)} experiments from dataset")
70
+
71
+ except Exception as e:
72
+ logger.warning(f"Failed to load from dataset: {e}")
73
+ # Fall back to backup data
74
+ self._load_backup_experiments()
75
+ else:
76
+ # No HF token, use backup data
77
+ self._load_backup_experiments()
78
+
79
+ except Exception as e:
80
+ logger.error(f"Failed to load experiments: {e}")
81
+ self._load_backup_experiments()
82
+
83
+ def _load_backup_experiments(self):
84
+ """Load backup experiments when dataset is not available"""
85
+ logger.info("🔄 Loading backup experiments...")
86
+
87
+ backup_experiments = {
88
+ 'exp_20250720_130853': {
89
+ 'id': 'exp_20250720_130853',
90
+ 'name': 'petite-elle-l-aime-3',
91
+ 'description': 'SmolLM3 fine-tuning experiment',
92
+ 'created_at': '2025-07-20T11:20:01.780908',
93
+ 'status': 'running',
94
+ 'metrics': [
95
+ {
96
+ 'timestamp': '2025-07-20T11:20:01.780908',
97
+ 'step': 25,
98
+ 'metrics': {
99
+ 'loss': 1.1659,
100
+ 'grad_norm': 10.3125,
101
+ 'learning_rate': 7e-08,
102
+ 'num_tokens': 1642080.0,
103
+ 'mean_token_accuracy': 0.75923578992486,
104
+ 'epoch': 0.004851130919895701
105
+ }
106
+ },
107
+ {
108
+ 'timestamp': '2025-07-20T11:26:39.042155',
109
+ 'step': 50,
110
+ 'metrics': {
111
+ 'loss': 1.165,
112
+ 'grad_norm': 10.75,
113
+ 'learning_rate': 1.4291666666666667e-07,
114
+ 'num_tokens': 3324682.0,
115
+ 'mean_token_accuracy': 0.7577659255266189,
116
+ 'epoch': 0.009702261839791402
117
+ }
118
+ },
119
+ {
120
+ 'timestamp': '2025-07-20T11:33:16.203045',
121
+ 'step': 75,
122
+ 'metrics': {
123
+ 'loss': 1.1639,
124
+ 'grad_norm': 10.6875,
125
+ 'learning_rate': 2.1583333333333334e-07,
126
+ 'num_tokens': 4987941.0,
127
+ 'mean_token_accuracy': 0.7581205774843692,
128
+ 'epoch': 0.014553392759687101
129
+ }
130
+ },
131
+ {
132
+ 'timestamp': '2025-07-20T11:39:53.453917',
133
+ 'step': 100,
134
+ 'metrics': {
135
+ 'loss': 1.1528,
136
+ 'grad_norm': 10.75,
137
+ 'learning_rate': 2.8875e-07,
138
+ 'num_tokens': 6630190.0,
139
+ 'mean_token_accuracy': 0.7614579878747463,
140
+ 'epoch': 0.019404523679582803
141
+ }
142
+ }
143
+ ],
144
+ 'parameters': {
145
+ 'model_name': 'HuggingFaceTB/SmolLM3-3B',
146
+ 'max_seq_length': 12288,
147
+ 'use_flash_attention': True,
148
+ 'use_gradient_checkpointing': False,
149
+ 'batch_size': 8,
150
+ 'gradient_accumulation_steps': 16,
151
+ 'learning_rate': 3.5e-06,
152
+ 'weight_decay': 0.01,
153
+ 'warmup_steps': 1200,
154
+ 'max_iters': 18000,
155
+ 'eval_interval': 1000,
156
+ 'log_interval': 25,
157
+ 'save_interval': 2000,
158
+ 'optimizer': 'adamw_torch',
159
+ 'beta1': 0.9,
160
+ 'beta2': 0.999,
161
+ 'eps': 1e-08,
162
+ 'scheduler': 'cosine',
163
+ 'min_lr': 3.5e-07,
164
+ 'fp16': False,
165
+ 'bf16': True,
166
+ 'ddp_backend': 'nccl',
167
+ 'ddp_find_unused_parameters': False,
168
+ 'save_steps': 2000,
169
+ 'eval_steps': 1000,
170
+ 'logging_steps': 25,
171
+ 'save_total_limit': 5,
172
+ 'eval_strategy': 'steps',
173
+ 'metric_for_best_model': 'eval_loss',
174
+ 'greater_is_better': False,
175
+ 'load_best_model_at_end': True,
176
+ 'data_dir': None,
177
+ 'train_file': None,
178
+ 'validation_file': None,
179
+ 'test_file': None,
180
+ 'use_chat_template': True,
181
+ 'chat_template_kwargs': {'add_generation_prompt': True, 'no_think_system_message': True},
182
+ 'enable_tracking': True,
183
+ 'trackio_url': 'https://tonic-test-trackio-test.hf.space',
184
+ 'trackio_token': None,
185
+ 'log_artifacts': True,
186
+ 'log_metrics': True,
187
+ 'log_config': True,
188
+ 'experiment_name': 'petite-elle-l-aime-3',
189
+ 'dataset_name': 'legmlai/openhermes-fr',
190
+ 'dataset_split': 'train',
191
+ 'input_field': 'prompt',
192
+ 'target_field': 'accepted_completion',
193
+ 'filter_bad_entries': True,
194
+ 'bad_entry_field': 'bad_entry',
195
+ 'packing': False,
196
+ 'max_prompt_length': 12288,
197
+ 'max_completion_length': 8192,
198
+ 'truncation': True,
199
+ 'dataloader_num_workers': 10,
200
+ 'dataloader_pin_memory': True,
201
+ 'dataloader_prefetch_factor': 3,
202
+ 'max_grad_norm': 1.0,
203
+ 'group_by_length': True
204
+ },
205
+ 'artifacts': [],
206
+ 'logs': []
207
+ },
208
+ 'exp_20250720_134319': {
209
+ 'id': 'exp_20250720_134319',
210
+ 'name': 'petite-elle-l-aime-3-1',
211
+ 'description': 'SmolLM3 fine-tuning experiment',
212
+ 'created_at': '2025-07-20T11:54:31.993219',
213
+ 'status': 'running',
214
+ 'metrics': [
215
+ {
216
+ 'timestamp': '2025-07-20T11:54:31.993219',
217
+ 'step': 25,
218
+ 'metrics': {
219
+ 'loss': 1.166,
220
+ 'grad_norm': 10.375,
221
+ 'learning_rate': 7e-08,
222
+ 'num_tokens': 1642080.0,
223
+ 'mean_token_accuracy': 0.7590958896279335,
224
+ 'epoch': 0.004851130919895701
225
+ }
226
+ },
227
+ {
228
+ 'timestamp': '2025-07-20T11:54:33.589487',
229
+ 'step': 25,
230
+ 'metrics': {
231
+ 'gpu_0_memory_allocated': 17.202261447906494,
232
+ 'gpu_0_memory_reserved': 75.474609375,
233
+ 'gpu_0_utilization': 0,
234
+ 'cpu_percent': 2.7,
235
+ 'memory_percent': 10.1
236
+ }
237
+ }
238
+ ],
239
+ 'parameters': {
240
+ 'model_name': 'HuggingFaceTB/SmolLM3-3B',
241
+ 'max_seq_length': 12288,
242
+ 'use_flash_attention': True,
243
+ 'use_gradient_checkpointing': False,
244
+ 'batch_size': 8,
245
+ 'gradient_accumulation_steps': 16,
246
+ 'learning_rate': 3.5e-06,
247
+ 'weight_decay': 0.01,
248
+ 'warmup_steps': 1200,
249
+ 'max_iters': 18000,
250
+ 'eval_interval': 1000,
251
+ 'log_interval': 25,
252
+ 'save_interval': 2000,
253
+ 'optimizer': 'adamw_torch',
254
+ 'beta1': 0.9,
255
+ 'beta2': 0.999,
256
+ 'eps': 1e-08,
257
+ 'scheduler': 'cosine',
258
+ 'min_lr': 3.5e-07,
259
+ 'fp16': False,
260
+ 'bf16': True,
261
+ 'ddp_backend': 'nccl',
262
+ 'ddp_find_unused_parameters': False,
263
+ 'save_steps': 2000,
264
+ 'eval_steps': 1000,
265
+ 'logging_steps': 25,
266
+ 'save_total_limit': 5,
267
+ 'eval_strategy': 'steps',
268
+ 'metric_for_best_model': 'eval_loss',
269
+ 'greater_is_better': False,
270
+ 'load_best_model_at_end': True,
271
+ 'data_dir': None,
272
+ 'train_file': None,
273
+ 'validation_file': None,
274
+ 'test_file': None,
275
+ 'use_chat_template': True,
276
+ 'chat_template_kwargs': {'add_generation_prompt': True, 'no_think_system_message': True},
277
+ 'enable_tracking': True,
278
+ 'trackio_url': 'https://tonic-test-trackio-test.hf.space',
279
+ 'trackio_token': None,
280
+ 'log_artifacts': True,
281
+ 'log_metrics': True,
282
+ 'log_config': True,
283
+ 'experiment_name': 'petite-elle-l-aime-3-1',
284
+ 'dataset_name': 'legmlai/openhermes-fr',
285
+ 'dataset_split': 'train',
286
+ 'input_field': 'prompt',
287
+ 'target_field': 'accepted_completion',
288
+ 'filter_bad_entries': True,
289
+ 'bad_entry_field': 'bad_entry',
290
+ 'packing': False,
291
+ 'max_prompt_length': 12288,
292
+ 'max_completion_length': 8192,
293
+ 'truncation': True,
294
+ 'dataloader_num_workers': 10,
295
+ 'dataloader_pin_memory': True,
296
+ 'dataloader_prefetch_factor': 3,
297
+ 'max_grad_norm': 1.0,
298
+ 'group_by_length': True
299
+ },
300
+ 'artifacts': [],
301
+ 'logs': []
302
+ }
303
+ }
304
+
305
+ self.experiments = backup_experiments
306
+ self.current_experiment = 'exp_20250720_134319'
307
+ logger.info(f"✅ Loaded {len(backup_experiments)} backup experiments")
308
+
309
+ def _save_experiments(self):
310
+ """Save experiments to HF Dataset"""
311
+ try:
312
+ if self.hf_token:
313
+ from datasets import Dataset
314
+ from huggingface_hub import HfApi
315
+
316
+ # Convert experiments to dataset format
317
+ dataset_data = []
318
+ for exp_id, exp_data in self.experiments.items():
319
+ dataset_data.append({
320
+ 'experiment_id': exp_id,
321
+ 'name': exp_data.get('name', ''),
322
+ 'description': exp_data.get('description', ''),
323
+ 'created_at': exp_data.get('created_at', ''),
324
+ 'status': exp_data.get('status', 'running'),
325
+ 'metrics': json.dumps(exp_data.get('metrics', [])),
326
+ 'parameters': json.dumps(exp_data.get('parameters', {})),
327
+ 'artifacts': json.dumps(exp_data.get('artifacts', [])),
328
+ 'logs': json.dumps(exp_data.get('logs', [])),
329
+ 'last_updated': datetime.now().isoformat()
330
+ })
331
+
332
+ # Create dataset
333
+ dataset = Dataset.from_list(dataset_data)
334
+
335
+ # Push to HF Hub
336
+ api = HfApi(token=self.hf_token)
337
+ dataset.push_to_hub(
338
+ self.dataset_repo,
339
+ token=self.hf_token,
340
+ private=True # Make it private for security
341
+ )
342
+
343
+ logger.info(f"✅ Saved {len(dataset_data)} experiments to {self.dataset_repo}")
344
+
345
+ else:
346
+ logger.warning("⚠️ No HF_TOKEN available, experiments not saved to dataset")
347
+
348
+ except Exception as e:
349
+ logger.error(f"Failed to save experiments to dataset: {e}")
350
+ # Fall back to local file for backup
351
+ try:
352
+ data = {
353
+ 'experiments': self.experiments,
354
+ 'current_experiment': self.current_experiment,
355
+ 'last_updated': datetime.now().isoformat()
356
+ }
357
+ with open("trackio_experiments_backup.json", 'w') as f:
358
+ json.dump(data, f, indent=2, default=str)
359
+ logger.info("✅ Saved backup to local file")
360
+ except Exception as backup_e:
361
+ logger.error(f"Failed to save backup: {backup_e}")
362
+
363
+ def create_experiment(self, name: str, description: str = "") -> Dict[str, Any]:
364
+ """Create a new experiment"""
365
+ experiment_id = f"exp_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
366
+
367
+ experiment = {
368
+ 'id': experiment_id,
369
+ 'name': name,
370
+ 'description': description,
371
+ 'created_at': datetime.now().isoformat(),
372
+ 'status': 'running',
373
+ 'metrics': [],
374
+ 'parameters': {},
375
+ 'artifacts': [],
376
+ 'logs': []
377
+ }
378
+
379
+ self.experiments[experiment_id] = experiment
380
+ self.current_experiment = experiment_id
381
+ self._save_experiments()
382
+
383
+ logger.info(f"Created experiment: {experiment_id} - {name}")
384
+ return experiment
385
+
386
+ def log_metrics(self, experiment_id: str, metrics: Dict[str, Any], step: Optional[int] = None):
387
+ """Log metrics for an experiment"""
388
+ if experiment_id not in self.experiments:
389
+ raise ValueError(f"Experiment {experiment_id} not found")
390
+
391
+ metric_entry = {
392
+ 'timestamp': datetime.now().isoformat(),
393
+ 'step': step,
394
+ 'metrics': metrics
395
+ }
396
+
397
+ self.experiments[experiment_id]['metrics'].append(metric_entry)
398
+ self._save_experiments()
399
+ logger.info(f"Logged metrics for experiment {experiment_id}: {metrics}")
400
+
401
+ def log_parameters(self, experiment_id: str, parameters: Dict[str, Any]):
402
+ """Log parameters for an experiment"""
403
+ if experiment_id not in self.experiments:
404
+ raise ValueError(f"Experiment {experiment_id} not found")
405
+
406
+ self.experiments[experiment_id]['parameters'].update(parameters)
407
+ self._save_experiments()
408
+ logger.info(f"Logged parameters for experiment {experiment_id}: {parameters}")
409
+
410
+ def log_artifact(self, experiment_id: str, artifact_name: str, artifact_data: str):
411
+ """Log an artifact for an experiment"""
412
+ if experiment_id not in self.experiments:
413
+ raise ValueError(f"Experiment {experiment_id} not found")
414
+
415
+ artifact_entry = {
416
+ 'name': artifact_name,
417
+ 'timestamp': datetime.now().isoformat(),
418
+ 'data': artifact_data
419
+ }
420
+
421
+ self.experiments[experiment_id]['artifacts'].append(artifact_entry)
422
+ self._save_experiments()
423
+ logger.info(f"Logged artifact for experiment {experiment_id}: {artifact_name}")
424
+
425
+ def get_experiment(self, experiment_id: str) -> Optional[Dict[str, Any]]:
426
+ """Get experiment details"""
427
+ return self.experiments.get(experiment_id)
428
+
429
+ def list_experiments(self) -> Dict[str, Any]:
430
+ """List all experiments"""
431
+ return {
432
+ 'experiments': list(self.experiments.keys()),
433
+ 'current_experiment': self.current_experiment,
434
+ 'total_experiments': len(self.experiments)
435
+ }
436
+
437
+ def update_experiment_status(self, experiment_id: str, status: str):
438
+ """Update experiment status"""
439
+ if experiment_id in self.experiments:
440
+ self.experiments[experiment_id]['status'] = status
441
+ self._save_experiments()
442
+ logger.info(f"Updated experiment {experiment_id} status to {status}")
443
+
444
+ def get_metrics_dataframe(self, experiment_id: str) -> pd.DataFrame:
445
+ """Get metrics as a pandas DataFrame for plotting"""
446
+ if experiment_id not in self.experiments:
447
+ return pd.DataFrame()
448
+
449
+ experiment = self.experiments[experiment_id]
450
+ if not experiment['metrics']:
451
+ return pd.DataFrame()
452
+
453
+ # Convert metrics to DataFrame
454
+ data = []
455
+ for metric_entry in experiment['metrics']:
456
+ step = metric_entry.get('step', 0)
457
+ timestamp = metric_entry.get('timestamp', '')
458
+ metrics = metric_entry.get('metrics', {})
459
+
460
+ row = {'step': step, 'timestamp': timestamp}
461
+ row.update(metrics)
462
+ data.append(row)
463
+
464
+ return pd.DataFrame(data)
465
+
466
+ # Global instance
467
+ trackio_space = TrackioSpace()
468
+
469
+ def update_trackio_config(hf_token: str, dataset_repo: str) -> str:
470
+ """Update TrackioSpace configuration with new HF token and dataset repository"""
471
+ global trackio_space
472
+
473
+ try:
474
+ # Create new instance with updated configuration
475
+ trackio_space = TrackioSpace(hf_token=hf_token if hf_token.strip() else None,
476
+ dataset_repo=dataset_repo if dataset_repo.strip() else None)
477
+
478
+ # Reload experiments with new configuration
479
+ trackio_space._load_experiments()
480
+
481
+ return f"✅ Configuration updated successfully!\n📊 Dataset: {trackio_space.dataset_repo}\n🔑 HF Token: {'Set' if trackio_space.hf_token else 'Not set'}\n📈 Loaded {len(trackio_space.experiments)} experiments"
482
+
483
+ except Exception as e:
484
+ return f"❌ Failed to update configuration: {str(e)}"
485
+
486
+ def test_dataset_connection(hf_token: str, dataset_repo: str) -> str:
487
+ """Test connection to HF Dataset repository"""
488
+ try:
489
+ if not hf_token.strip():
490
+ return "❌ Please provide a Hugging Face token"
491
+
492
+ if not dataset_repo.strip():
493
+ return "❌ Please provide a dataset repository"
494
+
495
+ from datasets import load_dataset
496
+
497
+ # Test loading the dataset
498
+ dataset = load_dataset(dataset_repo, token=hf_token)
499
+
500
+ # Count experiments
501
+ experiment_count = len(dataset['train']) if 'train' in dataset else 0
502
+
503
+ return f"✅ Connection successful!\n📊 Dataset: {dataset_repo}\n📈 Found {experiment_count} experiments\n🔗 Dataset URL: https://huggingface.co/datasets/{dataset_repo}"
504
+
505
+ except Exception as e:
506
+ return f"❌ Connection failed: {str(e)}\n\n💡 Troubleshooting:\n1. Check your HF token is correct\n2. Verify the dataset repository exists\n3. Ensure your token has read access to the dataset"
507
+
508
+ def create_dataset_repository(hf_token: str, dataset_repo: str) -> str:
509
+ """Create HF Dataset repository if it doesn't exist"""
510
+ try:
511
+ if not hf_token.strip():
512
+ return "❌ Please provide a Hugging Face token"
513
+
514
+ if not dataset_repo.strip():
515
+ return "❌ Please provide a dataset repository"
516
+
517
+ from datasets import Dataset
518
+ from huggingface_hub import HfApi
519
+
520
+ # Parse username and dataset name
521
+ if '/' not in dataset_repo:
522
+ return "❌ Dataset repository must be in format: username/dataset-name"
523
+
524
+ username, dataset_name = dataset_repo.split('/', 1)
525
+
526
+ # Create API client
527
+ api = HfApi(token=hf_token)
528
+
529
+ # Check if dataset exists
530
+ try:
531
+ api.dataset_info(dataset_repo)
532
+ return f"✅ Dataset {dataset_repo} already exists!"
533
+ except:
534
+ # Dataset doesn't exist, create it
535
+ pass
536
+
537
+ # Create empty dataset
538
+ empty_dataset = Dataset.from_dict({
539
+ 'experiment_id': [],
540
+ 'name': [],
541
+ 'description': [],
542
+ 'created_at': [],
543
+ 'status': [],
544
+ 'metrics': [],
545
+ 'parameters': [],
546
+ 'artifacts': [],
547
+ 'logs': [],
548
+ 'last_updated': []
549
+ })
550
+
551
+ # Push to hub
552
+ empty_dataset.push_to_hub(
553
+ dataset_repo,
554
+ token=hf_token,
555
+ private=True
556
+ )
557
+
558
+ return f"✅ Dataset {dataset_repo} created successfully!\n🔗 View at: https://huggingface.co/datasets/{dataset_repo}\n📊 Ready to store experiments"
559
+
560
+ except Exception as e:
561
+ return f"❌ Failed to create dataset: {str(e)}\n\n💡 Troubleshooting:\n1. Check your HF token has write permissions\n2. Verify the username in the repository name\n3. Ensure the dataset name is valid"
562
+
563
+ # Initialize API client for remote data
564
+ api_client = None
565
+ try:
566
+ from trackio_api_client import TrackioAPIClient
567
+ api_client = TrackioAPIClient("https://tonic-test-trackio-test.hf.space")
568
+ logger.info("✅ API client initialized for remote data access")
569
+ except ImportError:
570
+ logger.warning("⚠️ API client not available, using local data only")
571
+
572
+ # Add Hugging Face Spaces compatibility
573
+ def is_huggingface_spaces():
574
+ """Check if running on Hugging Face Spaces"""
575
+ return os.environ.get('SPACE_ID') is not None
576
+
577
+ def get_persistent_data_path():
578
+ """Get a persistent data path for Hugging Face Spaces"""
579
+ if is_huggingface_spaces():
580
+ # Use a path that might persist better on HF Spaces
581
+ return "/tmp/trackio_experiments.json"
582
+ else:
583
+ return "trackio_experiments.json"
584
+
585
+ # Override the data file path for HF Spaces
586
+ if is_huggingface_spaces():
587
+ logger.info("🚀 Running on Hugging Face Spaces - using persistent storage")
588
+ trackio_space.data_file = get_persistent_data_path()
589
+
590
+ def get_remote_experiment_data(experiment_id: str) -> Dict[str, Any]:
591
+ """Get experiment data from remote API"""
592
+ if api_client is None:
593
+ return None
594
+
595
+ try:
596
+ # Get experiment details from API
597
+ details_result = api_client.get_experiment_details(experiment_id)
598
+ if "success" in details_result:
599
+ return {"remote": True, "data": details_result["data"]}
600
+ else:
601
+ logger.warning(f"Failed to get remote data for {experiment_id}: {details_result}")
602
+ return None
603
+ except Exception as e:
604
+ logger.error(f"Error getting remote data: {e}")
605
+ return None
606
+
607
+ def parse_remote_metrics_data(experiment_details: str) -> pd.DataFrame:
608
+ """Parse metrics data from remote experiment details"""
609
+ try:
610
+ # Look for metrics in the experiment details
611
+ lines = experiment_details.split('\n')
612
+ metrics_data = []
613
+
614
+ for line in lines:
615
+ if 'Step:' in line and 'Metrics:' in line:
616
+ # Extract step and metrics from the line
617
+ try:
618
+ # Parse step number
619
+ step_part = line.split('Step:')[1].split('Metrics:')[0].strip()
620
+ step = int(step_part)
621
+
622
+ # Parse metrics JSON
623
+ metrics_part = line.split('Metrics:')[1].strip()
624
+ metrics = json.loads(metrics_part)
625
+
626
+ # Add timestamp
627
+ row = {'step': step, 'timestamp': datetime.now().isoformat()}
628
+ row.update(metrics)
629
+ metrics_data.append(row)
630
+
631
+ except (ValueError, json.JSONDecodeError) as e:
632
+ logger.warning(f"Failed to parse metrics line: {line} - {e}")
633
+ continue
634
+
635
+ if metrics_data:
636
+ return pd.DataFrame(metrics_data)
637
+ else:
638
+ return pd.DataFrame()
639
+
640
+ except Exception as e:
641
+ logger.error(f"Error parsing remote metrics: {e}")
642
+ return pd.DataFrame()
643
+
644
+ def get_metrics_dataframe(experiment_id: str) -> pd.DataFrame:
645
+ """Get metrics as a pandas DataFrame for plotting - tries remote first, then local"""
646
+ # Try to get remote data first
647
+ remote_data = get_remote_experiment_data(experiment_id)
648
+ if remote_data:
649
+ logger.info(f"Using remote data for {experiment_id}")
650
+ # Parse the remote experiment details to extract metrics
651
+ df = parse_remote_metrics_data(remote_data["data"])
652
+ if not df.empty:
653
+ logger.info(f"Found {len(df)} metrics entries from remote data")
654
+ return df
655
+ else:
656
+ logger.warning(f"No metrics found in remote data for {experiment_id}")
657
+
658
+ # Fall back to local data
659
+ logger.info(f"Using local data for {experiment_id}")
660
+ return trackio_space.get_metrics_dataframe(experiment_id)
661
+
662
+ def create_experiment_interface(name: str, description: str) -> str:
663
+ """Create a new experiment"""
664
+ try:
665
+ experiment = trackio_space.create_experiment(name, description)
666
+ return f"✅ Experiment created successfully!\nID: {experiment['id']}\nName: {experiment['name']}\nStatus: {experiment['status']}"
667
+ except Exception as e:
668
+ return f"❌ Error creating experiment: {str(e)}"
669
+
670
+ def log_metrics_interface(experiment_id: str, metrics_json: str, step: str) -> str:
671
+ """Log metrics for an experiment"""
672
+ try:
673
+ metrics = json.loads(metrics_json)
674
+ step_int = int(step) if step else None
675
+ trackio_space.log_metrics(experiment_id, metrics, step_int)
676
+ return f"✅ Metrics logged successfully for experiment {experiment_id}\nStep: {step_int}\nMetrics: {json.dumps(metrics, indent=2)}"
677
+ except Exception as e:
678
+ return f"❌ Error logging metrics: {str(e)}"
679
+
680
+ def log_parameters_interface(experiment_id: str, parameters_json: str) -> str:
681
+ """Log parameters for an experiment"""
682
+ try:
683
+ parameters = json.loads(parameters_json)
684
+ trackio_space.log_parameters(experiment_id, parameters)
685
+ return f"✅ Parameters logged successfully for experiment {experiment_id}\nParameters: {json.dumps(parameters, indent=2)}"
686
+ except Exception as e:
687
+ return f"❌ Error logging parameters: {str(e)}"
688
+
689
+ def get_experiment_details(experiment_id: str) -> str:
690
+ """Get experiment details"""
691
+ try:
692
+ experiment = trackio_space.get_experiment(experiment_id)
693
+ if experiment:
694
+ # Format the output nicely
695
+ details = f"""
696
+ 📊 EXPERIMENT DETAILS
697
+ ====================
698
+ ID: {experiment['id']}
699
+ Name: {experiment['name']}
700
+ Description: {experiment['description']}
701
+ Status: {experiment['status']}
702
+ Created: {experiment['created_at']}
703
+
704
+ 📈 METRICS COUNT: {len(experiment['metrics'])}
705
+ 📋 PARAMETERS COUNT: {len(experiment['parameters'])}
706
+ 📦 ARTIFACTS COUNT: {len(experiment['artifacts'])}
707
+
708
+ 🔧 PARAMETERS:
709
+ {json.dumps(experiment['parameters'], indent=2)}
710
+
711
+ 📊 LATEST METRICS:
712
+ """
713
+ if experiment['metrics']:
714
+ latest_metrics = experiment['metrics'][-1]
715
+ details += f"Step: {latest_metrics.get('step', 'N/A')}\n"
716
+ details += f"Timestamp: {latest_metrics.get('timestamp', 'N/A')}\n"
717
+ details += f"Metrics: {json.dumps(latest_metrics.get('metrics', {}), indent=2)}"
718
+ else:
719
+ details += "No metrics logged yet."
720
+
721
+ return details
722
+ else:
723
+ return f"❌ Experiment {experiment_id} not found"
724
+ except Exception as e:
725
+ return f"❌ Error getting experiment details: {str(e)}"
726
+
727
+ def list_experiments_interface() -> str:
728
+ """List all experiments with details"""
729
+ try:
730
+ experiments_info = trackio_space.list_experiments()
731
+ experiments = trackio_space.experiments
732
+
733
+ if not experiments:
734
+ return "📭 No experiments found. Create one first!"
735
+
736
+ result = f"📋 EXPERIMENTS OVERVIEW\n{'='*50}\n"
737
+ result += f"Total Experiments: {len(experiments)}\n"
738
+ result += f"Current Experiment: {experiments_info['current_experiment']}\n\n"
739
+
740
+ for exp_id, exp_data in experiments.items():
741
+ status_emoji = {
742
+ 'running': '🟢',
743
+ 'completed': '✅',
744
+ 'failed': '❌',
745
+ 'paused': '⏸️'
746
+ }.get(exp_data['status'], '❓')
747
+
748
+ result += f"{status_emoji} {exp_id}\n"
749
+ result += f" Name: {exp_data['name']}\n"
750
+ result += f" Status: {exp_data['status']}\n"
751
+ result += f" Created: {exp_data['created_at']}\n"
752
+ result += f" Metrics: {len(exp_data['metrics'])} entries\n"
753
+ result += f" Parameters: {len(exp_data['parameters'])} entries\n"
754
+ result += f" Artifacts: {len(exp_data['artifacts'])} entries\n\n"
755
+
756
+ return result
757
+ except Exception as e:
758
+ return f"❌ Error listing experiments: {str(e)}"
759
+
760
+ def update_experiment_status_interface(experiment_id: str, status: str) -> str:
761
+ """Update experiment status"""
762
+ try:
763
+ trackio_space.update_experiment_status(experiment_id, status)
764
+ return f"✅ Experiment {experiment_id} status updated to {status}"
765
+ except Exception as e:
766
+ return f"❌ Error updating experiment status: {str(e)}"
767
+
768
+ def create_metrics_plot(experiment_id: str, metric_name: str = "loss") -> go.Figure:
769
+ """Create a plot for a specific metric"""
770
+ try:
771
+ df = get_metrics_dataframe(experiment_id)
772
+ if df.empty:
773
+ # Return empty plot
774
+ fig = go.Figure()
775
+ fig.add_annotation(
776
+ text="No metrics data available",
777
+ xref="paper", yref="paper",
778
+ x=0.5, y=0.5, showarrow=False
779
+ )
780
+ return fig
781
+
782
+ if metric_name not in df.columns:
783
+ # Show available metrics
784
+ available_metrics = [col for col in df.columns if col not in ['step', 'timestamp']]
785
+ fig = go.Figure()
786
+ fig.add_annotation(
787
+ text=f"Available metrics: {', '.join(available_metrics)}",
788
+ xref="paper", yref="paper",
789
+ x=0.5, y=0.5, showarrow=False
790
+ )
791
+ return fig
792
+
793
+ fig = px.line(df, x='step', y=metric_name, title=f'{metric_name} over time')
794
+ fig.update_layout(
795
+ xaxis_title="Training Step",
796
+ yaxis_title=metric_name.title(),
797
+ hovermode='x unified'
798
+ )
799
+ return fig
800
+
801
+ except Exception as e:
802
+ fig = go.Figure()
803
+ fig.add_annotation(
804
+ text=f"Error creating plot: {str(e)}",
805
+ xref="paper", yref="paper",
806
+ x=0.5, y=0.5, showarrow=False
807
+ )
808
+ return fig
809
+
810
+ def create_experiment_comparison(experiment_ids: str) -> go.Figure:
811
+ """Compare multiple experiments"""
812
+ try:
813
+ exp_ids = [exp_id.strip() for exp_id in experiment_ids.split(',')]
814
+
815
+ fig = go.Figure()
816
+
817
+ for exp_id in exp_ids:
818
+ df = get_metrics_dataframe(exp_id)
819
+ if not df.empty and 'loss' in df.columns:
820
+ fig.add_trace(go.Scatter(
821
+ x=df['step'],
822
+ y=df['loss'],
823
+ mode='lines+markers',
824
+ name=f"{exp_id} - Loss",
825
+ line=dict(width=2)
826
+ ))
827
+
828
+ fig.update_layout(
829
+ title="Experiment Comparison - Loss",
830
+ xaxis_title="Training Step",
831
+ yaxis_title="Loss",
832
+ hovermode='x unified'
833
+ )
834
+
835
+ return fig
836
+
837
+ except Exception as e:
838
+ fig = go.Figure()
839
+ fig.add_annotation(
840
+ text=f"Error creating comparison: {str(e)}",
841
+ xref="paper", yref="paper",
842
+ x=0.5, y=0.5, showarrow=False
843
+ )
844
+ return fig
845
+
846
+ def simulate_training_data(experiment_id: str):
847
+ """Simulate training data for demonstration"""
848
+ try:
849
+ # Simulate some realistic training metrics
850
+ for step in range(0, 1000, 50):
851
+ # Simulate loss decreasing over time
852
+ loss = 2.0 * np.exp(-step / 500) + 0.1 * np.random.random()
853
+ accuracy = 0.3 + 0.6 * (1 - np.exp(-step / 300)) + 0.05 * np.random.random()
854
+ lr = 3.5e-6 * (0.9 ** (step // 200))
855
+
856
+ metrics = {
857
+ "loss": round(loss, 4),
858
+ "accuracy": round(accuracy, 4),
859
+ "learning_rate": round(lr, 8),
860
+ "gpu_memory": round(20 + 5 * np.random.random(), 2),
861
+ "training_time": round(0.5 + 0.2 * np.random.random(), 3)
862
+ }
863
+
864
+ trackio_space.log_metrics(experiment_id, metrics, step)
865
+
866
+ return f"✅ Simulated training data for experiment {experiment_id}\nAdded 20 metric entries (steps 0-950)"
867
+ except Exception as e:
868
+ return f"❌ Error simulating data: {str(e)}"
869
+
870
+ def create_demo_experiment():
871
+ """Create a demo experiment with training data"""
872
+ try:
873
+ # Create demo experiment
874
+ experiment = trackio_space.create_experiment(
875
+ "demo_smollm3_training",
876
+ "Demo experiment with simulated training data"
877
+ )
878
+
879
+ experiment_id = experiment['id']
880
+
881
+ # Add some demo parameters
882
+ parameters = {
883
+ "model_name": "HuggingFaceTB/SmolLM3-3B",
884
+ "batch_size": 8,
885
+ "learning_rate": 3.5e-6,
886
+ "max_iters": 18000,
887
+ "mixed_precision": "bf16",
888
+ "dataset": "legmlai/openhermes-fr"
889
+ }
890
+ trackio_space.log_parameters(experiment_id, parameters)
891
+
892
+ # Add demo training data
893
+ simulate_training_data(experiment_id)
894
+
895
+ return f"✅ Demo experiment created: {experiment_id}\nYou can now test the visualization with this experiment!"
896
+ except Exception as e:
897
+ return f"❌ Error creating demo experiment: {str(e)}"
898
+
899
+ # Create Gradio interface
900
+ with gr.Blocks(title="Trackio - Experiment Tracking", theme=gr.themes.Soft()) as demo:
901
+ gr.Markdown("# 🚀 Trackio Experiment Tracking & Monitoring")
902
+ gr.Markdown("Monitor and track your ML experiments with real-time visualization!")
903
+
904
+ with gr.Tabs():
905
+ # Configuration Tab
906
+ with gr.Tab("⚙️ Configuration"):
907
+ gr.Markdown("### Configure HF Datasets Connection")
908
+ gr.Markdown("Set your Hugging Face token and dataset repository for persistent experiment storage.")
909
+
910
+ with gr.Row():
911
+ with gr.Column():
912
+ hf_token_input = gr.Textbox(
913
+ label="Hugging Face Token",
914
+ placeholder="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
915
+ type="password",
916
+ info="Your HF token for dataset access (optional - will use environment variable if not set)"
917
+ )
918
+ dataset_repo_input = gr.Textbox(
919
+ label="Dataset Repository",
920
+ placeholder="your-username/your-dataset-name",
921
+ value="tonic/trackio-experiments",
922
+ info="HF Dataset repository for experiment storage"
923
+ )
924
+
925
+ with gr.Row():
926
+ update_config_btn = gr.Button("Update Configuration", variant="primary")
927
+ test_connection_btn = gr.Button("Test Connection", variant="secondary")
928
+ create_repo_btn = gr.Button("Create Dataset", variant="success")
929
+
930
+ gr.Markdown("### Current Configuration")
931
+ current_config_output = gr.Textbox(
932
+ label="Status",
933
+ lines=8,
934
+ interactive=False,
935
+ value=f"📊 Dataset: {trackio_space.dataset_repo}\n🔑 HF Token: {'Set' if trackio_space.hf_token else 'Not set'}\n📈 Experiments: {len(trackio_space.experiments)}"
936
+ )
937
+
938
+ with gr.Column():
939
+ gr.Markdown("### Configuration Help")
940
+ gr.Markdown("""
941
+ **Getting Your HF Token:**
942
+ 1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
943
+ 2. Click "New token"
944
+ 3. Give it a name (e.g., "Trackio Access")
945
+ 4. Select "Write" permissions
946
+ 5. Copy the token and paste it above
947
+
948
+ **Dataset Repository:**
949
+ - Format: `username/dataset-name`
950
+ - Examples: `tonic/trackio-experiments`, `your-username/my-experiments`
951
+ - Use "Create Dataset" button to create a new repository
952
+
953
+ **Environment Variables:**
954
+ You can also set these as environment variables:
955
+ - `HF_TOKEN`: Your Hugging Face token
956
+ - `TRACKIO_DATASET_REPO`: Dataset repository
957
+
958
+ **Actions:**
959
+ - **Update Configuration**: Apply new settings and reload experiments
960
+ - **Test Connection**: Verify access to the dataset repository
961
+ - **Create Dataset**: Create a new dataset repository if it doesn't exist
962
+ """)
963
+
964
+ update_config_btn.click(
965
+ update_trackio_config,
966
+ inputs=[hf_token_input, dataset_repo_input],
967
+ outputs=current_config_output
968
+ )
969
+
970
+ test_connection_btn.click(
971
+ test_dataset_connection,
972
+ inputs=[hf_token_input, dataset_repo_input],
973
+ outputs=current_config_output
974
+ )
975
+
976
+ create_repo_btn.click(
977
+ create_dataset_repository,
978
+ inputs=[hf_token_input, dataset_repo_input],
979
+ outputs=current_config_output
980
+ )
981
+
982
+ # Create Experiment Tab
983
+ with gr.Tab("Create Experiment"):
984
+ gr.Markdown("### Create a New Experiment")
985
+ with gr.Row():
986
+ with gr.Column():
987
+ experiment_name = gr.Textbox(
988
+ label="Experiment Name",
989
+ placeholder="my_smollm3_finetune",
990
+ value="smollm3_finetune"
991
+ )
992
+ experiment_description = gr.Textbox(
993
+ label="Description",
994
+ placeholder="Fine-tuning SmolLM3 model on custom dataset",
995
+ value="SmolLM3 fine-tuning experiment"
996
+ )
997
+ create_btn = gr.Button("Create Experiment", variant="primary")
998
+
999
+ with gr.Column():
1000
+ create_output = gr.Textbox(
1001
+ label="Result",
1002
+ lines=5,
1003
+ interactive=False
1004
+ )
1005
+
1006
+ create_btn.click(
1007
+ create_experiment_interface,
1008
+ inputs=[experiment_name, experiment_description],
1009
+ outputs=create_output
1010
+ )
1011
+
1012
+ # Log Metrics Tab
1013
+ with gr.Tab("Log Metrics"):
1014
+ gr.Markdown("### Log Training Metrics")
1015
+ with gr.Row():
1016
+ with gr.Column():
1017
+ metrics_exp_id = gr.Textbox(
1018
+ label="Experiment ID",
1019
+ placeholder="exp_20231201_143022"
1020
+ )
1021
+ metrics_json = gr.Textbox(
1022
+ label="Metrics (JSON)",
1023
+ placeholder='{"loss": 0.5, "accuracy": 0.85, "learning_rate": 2e-5}',
1024
+ value='{"loss": 0.5, "accuracy": 0.85, "learning_rate": 2e-5, "gpu_memory": 22.5}'
1025
+ )
1026
+ metrics_step = gr.Textbox(
1027
+ label="Step (optional)",
1028
+ placeholder="100"
1029
+ )
1030
+ log_metrics_btn = gr.Button("Log Metrics", variant="primary")
1031
+
1032
+ with gr.Column():
1033
+ metrics_output = gr.Textbox(
1034
+ label="Result",
1035
+ lines=5,
1036
+ interactive=False
1037
+ )
1038
+
1039
+ log_metrics_btn.click(
1040
+ log_metrics_interface,
1041
+ inputs=[metrics_exp_id, metrics_json, metrics_step],
1042
+ outputs=metrics_output
1043
+ )
1044
+
1045
+ # Log Parameters Tab
1046
+ with gr.Tab("Log Parameters"):
1047
+ gr.Markdown("### Log Experiment Parameters")
1048
+ with gr.Row():
1049
+ with gr.Column():
1050
+ params_exp_id = gr.Textbox(
1051
+ label="Experiment ID",
1052
+ placeholder="exp_20231201_143022"
1053
+ )
1054
+ parameters_json = gr.Textbox(
1055
+ label="Parameters (JSON)",
1056
+ placeholder='{"learning_rate": 2e-5, "batch_size": 4}',
1057
+ value='{"learning_rate": 3.5e-6, "batch_size": 8, "model_name": "HuggingFaceTB/SmolLM3-3B", "max_iters": 18000, "mixed_precision": "bf16"}'
1058
+ )
1059
+ log_params_btn = gr.Button("Log Parameters", variant="primary")
1060
+
1061
+ with gr.Column():
1062
+ params_output = gr.Textbox(
1063
+ label="Result",
1064
+ lines=5,
1065
+ interactive=False
1066
+ )
1067
+
1068
+ log_params_btn.click(
1069
+ log_parameters_interface,
1070
+ inputs=[params_exp_id, parameters_json],
1071
+ outputs=params_output
1072
+ )
1073
+
1074
+ # View Experiments Tab
1075
+ with gr.Tab("View Experiments"):
1076
+ gr.Markdown("### View Experiment Details")
1077
+ with gr.Row():
1078
+ with gr.Column():
1079
+ view_exp_id = gr.Textbox(
1080
+ label="Experiment ID",
1081
+ placeholder="exp_20231201_143022"
1082
+ )
1083
+ view_btn = gr.Button("View Experiment", variant="primary")
1084
+ list_btn = gr.Button("List All Experiments", variant="secondary")
1085
+
1086
+ with gr.Column():
1087
+ view_output = gr.Textbox(
1088
+ label="Experiment Details",
1089
+ lines=20,
1090
+ interactive=False
1091
+ )
1092
+
1093
+ view_btn.click(
1094
+ get_experiment_details,
1095
+ inputs=[view_exp_id],
1096
+ outputs=view_output
1097
+ )
1098
+
1099
+ list_btn.click(
1100
+ list_experiments_interface,
1101
+ inputs=[],
1102
+ outputs=view_output
1103
+ )
1104
+
1105
+ # Visualization Tab
1106
+ with gr.Tab("📊 Visualizations"):
1107
+ gr.Markdown("### Training Metrics Visualization")
1108
+ with gr.Row():
1109
+ with gr.Column():
1110
+ plot_exp_id = gr.Textbox(
1111
+ label="Experiment ID",
1112
+ placeholder="exp_20231201_143022"
1113
+ )
1114
+ metric_dropdown = gr.Dropdown(
1115
+ label="Metric to Plot",
1116
+ choices=["loss", "accuracy", "learning_rate", "gpu_memory", "training_time"],
1117
+ value="loss"
1118
+ )
1119
+ plot_btn = gr.Button("Create Plot", variant="primary")
1120
+
1121
+ with gr.Column():
1122
+ plot_output = gr.Plot(label="Training Metrics")
1123
+
1124
+ plot_btn.click(
1125
+ create_metrics_plot,
1126
+ inputs=[plot_exp_id, metric_dropdown],
1127
+ outputs=plot_output
1128
+ )
1129
+
1130
+ gr.Markdown("### Experiment Comparison")
1131
+ with gr.Row():
1132
+ with gr.Column():
1133
+ comparison_exp_ids = gr.Textbox(
1134
+ label="Experiment IDs (comma-separated)",
1135
+ placeholder="exp_1,exp_2,exp_3"
1136
+ )
1137
+ comparison_btn = gr.Button("Compare Experiments", variant="primary")
1138
+
1139
+ with gr.Column():
1140
+ comparison_plot = gr.Plot(label="Experiment Comparison")
1141
+
1142
+ comparison_btn.click(
1143
+ create_experiment_comparison,
1144
+ inputs=[comparison_exp_ids],
1145
+ outputs=comparison_plot
1146
+ )
1147
+
1148
+ # Demo Data Tab
1149
+ with gr.Tab("🎯 Demo Data"):
1150
+ gr.Markdown("### Generate Demo Training Data")
1151
+ gr.Markdown("Use this to simulate training data for testing the interface")
1152
+ with gr.Row():
1153
+ with gr.Column():
1154
+ demo_exp_id = gr.Textbox(
1155
+ label="Experiment ID",
1156
+ placeholder="exp_20231201_143022"
1157
+ )
1158
+ demo_btn = gr.Button("Generate Demo Data", variant="primary")
1159
+ create_demo_btn = gr.Button("Create Demo Experiment", variant="secondary")
1160
+
1161
+ with gr.Column():
1162
+ demo_output = gr.Textbox(
1163
+ label="Result",
1164
+ lines=5,
1165
+ interactive=False
1166
+ )
1167
+
1168
+ demo_btn.click(
1169
+ simulate_training_data,
1170
+ inputs=[demo_exp_id],
1171
+ outputs=demo_output
1172
+ )
1173
+
1174
+ create_demo_btn.click(
1175
+ create_demo_experiment,
1176
+ inputs=[],
1177
+ outputs=demo_output
1178
+ )
1179
+
1180
+ # Update Status Tab
1181
+ with gr.Tab("Update Status"):
1182
+ gr.Markdown("### Update Experiment Status")
1183
+ with gr.Row():
1184
+ with gr.Column():
1185
+ status_exp_id = gr.Textbox(
1186
+ label="Experiment ID",
1187
+ placeholder="exp_20231201_143022"
1188
+ )
1189
+ status_dropdown = gr.Dropdown(
1190
+ label="Status",
1191
+ choices=["running", "completed", "failed", "paused"],
1192
+ value="running"
1193
+ )
1194
+ update_status_btn = gr.Button("Update Status", variant="primary")
1195
+
1196
+ with gr.Column():
1197
+ status_output = gr.Textbox(
1198
+ label="Result",
1199
+ lines=3,
1200
+ interactive=False
1201
+ )
1202
+
1203
+ update_status_btn.click(
1204
+ update_experiment_status_interface,
1205
+ inputs=[status_exp_id, status_dropdown],
1206
+ outputs=status_output
1207
+ )
1208
+
1209
+ # Launch the app
1210
+ if __name__ == "__main__":
1211
+ demo.launch()
scripts/trackio_tonic/requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gradio and web interface
2
+ gradio>=4.0.0
3
+ gradio-client>=0.10.0
4
+
5
+ # Core dependencies for Trackio Space
6
+ requests>=2.31.0
7
+ numpy>=1.24.0
8
+ pandas>=2.0.0
9
+
10
+ # JSON and data handling
11
+ jsonschema>=4.17.0
12
+
13
+ # Optional: for better UI
14
+ plotly>=5.0.0
15
+ pandas>=2.0.0
16
+ numpy>=1.24.0
17
+ datasets>=2.14.0
18
+ huggingface-hub>=0.16.0
19
+ requests>=2.31.0
20
+
21
+ # Development and debugging
22
+ python-dotenv>=1.0.0