Tonic commited on
Commit
fcf2981
·
1 Parent(s): ce0d824

adds gpt-oss support

Browse files
README.md CHANGED
@@ -10,7 +10,7 @@
10
 
11
  # 🤏🏻🏭SmolFactory
12
 
13
- SmolFactory helps you train , monitor and deploy your Smollm3 finetune , and more !
14
 
15
  <table>
16
  <tr>
@@ -35,7 +35,7 @@ Train and deploy your model with one simple command !
35
  - **Trackio Monitoring Space**: Real-time training metrics, loss curves, and resource utilization
36
  - **Demo Spaces**: Instant web interfaces for model testing and demonstration
37
  - **Real-time Metrics**: Live training loss, learning rate, gradient norms, and GPU utilization
38
- - **Custom Dashboards**: Tailored visualizations for SmolLM3 fine-tuning
39
  - **Artifact Logging**: Model checkpoints, configuration files, and training logs
40
  - **Experiment Comparison**: Side-by-side analysis of different training runs
41
  - **Alert System**: Notifications for training issues or completion
@@ -44,6 +44,7 @@ Train and deploy your model with one simple command !
44
  - **Reproducibility**: Complete experiment history with configuration snapshots
45
  - **Collaboration**: Easy sharing of training results and model comparisons
46
  - **Version Control**: Track dataset changes and model performance over time
 
47
 
48
  ## 🚀 Quick Start
49
 
@@ -57,7 +58,7 @@ The easiest way to get started is using the interactive pipeline:
57
 
58
  This script will:
59
  1. **Authenticate** with Hugging Face (write + read tokens)
60
- 2. **Configure** training parameters interactively
61
  3. **Deploy** Trackio Space for monitoring
62
  4. **Setup** HF Dataset for experiment tracking
63
  5. **Execute** training with your chosen configuration
 
10
 
11
  # 🤏🏻🏭SmolFactory
12
 
13
+ SmolFactory helps you train, monitor and deploy your SmolLM3 and GPT-OSS fine-tunes, and more!
14
 
15
  <table>
16
  <tr>
 
35
  - **Trackio Monitoring Space**: Real-time training metrics, loss curves, and resource utilization
36
  - **Demo Spaces**: Instant web interfaces for model testing and demonstration
37
  - **Real-time Metrics**: Live training loss, learning rate, gradient norms, and GPU utilization
38
+ - **Custom Dashboards**: Tailored visualizations for SmolLM3 and GPT-OSS fine-tuning
39
  - **Artifact Logging**: Model checkpoints, configuration files, and training logs
40
  - **Experiment Comparison**: Side-by-side analysis of different training runs
41
  - **Alert System**: Notifications for training issues or completion
 
44
  - **Reproducibility**: Complete experiment history with configuration snapshots
45
  - **Collaboration**: Easy sharing of training results and model comparisons
46
  - **Version Control**: Track dataset changes and model performance over time
47
+ - **GPT-OSS Support**: Specialized configurations for OpenAI's GPT-OSS-20B model with LoRA and multilingual reasoning
48
 
49
  ## 🚀 Quick Start
50
 
 
58
 
59
  This script will:
60
  1. **Authenticate** with Hugging Face (write + read tokens)
61
+ 2. **Configure** training parameters interactively (SmolLM3 or GPT-OSS)
62
  3. **Deploy** Trackio Space for monitoring
63
  4. **Setup** HF Dataset for experiment tracking
64
  5. **Execute** training with your chosen configuration
config/train_gpt_oss_basic.py ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GPT-OSS Basic Training Configuration
3
+ Based on OpenAI's GPT-OSS fine-tuning tutorial
4
+ Optimized for standard fine-tuning scenarios
5
+ """
6
+
7
+ import os
8
+ from dataclasses import dataclass
9
+ from typing import Optional
10
+
11
+ @dataclass
12
+ class GPTOSSBasicConfig:
13
+ """Basic configuration for GPT-OSS fine-tuning"""
14
+
15
+ # Trainer type selection
16
+ trainer_type: str = "sft" # "sft" or "dpo"
17
+
18
+ # Model configuration - GPT-OSS specific
19
+ model_name: str = "openai/gpt-oss-20b"
20
+ max_seq_length: int = 2048 # GPT-OSS default
21
+ use_flash_attention: bool = True
22
+ use_gradient_checkpointing: bool = True
23
+
24
+ # Training configuration - optimized for GPT-OSS
25
+ batch_size: int = 4 # Conservative for 20B model
26
+ gradient_accumulation_steps: int = 4
27
+ learning_rate: float = 2e-4 # Higher LR as per tutorial
28
+ weight_decay: float = 0.01
29
+ warmup_steps: int = 100
30
+ max_iters: int = 1000
31
+ eval_interval: int = 100
32
+ log_interval: int = 10
33
+ save_interval: int = 500
34
+
35
+ # Optimizer configuration
36
+ optimizer: str = "adamw_torch"
37
+ beta1: float = 0.9
38
+ beta2: float = 0.95
39
+ eps: float = 1e-8
40
+
41
+ # Scheduler configuration
42
+ scheduler: str = "cosine_with_min_lr"
43
+ min_lr: float = 2e-5 # Higher min LR as per tutorial
44
+ lr_scheduler_kwargs: dict = None
45
+
46
+ # Mixed precision - GPT-OSS optimized
47
+ fp16: bool = False # Use bf16 for GPT-OSS
48
+ bf16: bool = True
49
+
50
+ # DDP configuration
51
+ ddp_backend: str = "nccl"
52
+ ddp_find_unused_parameters: bool = False
53
+
54
+ # Logging and saving
55
+ save_steps: int = 500
56
+ eval_steps: int = 100
57
+ logging_steps: int = 10
58
+ save_total_limit: Optional[int] = 3
59
+
60
+ # Evaluation
61
+ eval_strategy: str = "steps"
62
+ metric_for_best_model: str = "eval_loss"
63
+ greater_is_better: bool = False
64
+ load_best_model_at_end: bool = True
65
+
66
+ # Data configuration
67
+ dataset_name: str = "HuggingFaceH4/Multilingual-Thinking"
68
+ dataset_split: str = "train"
69
+ input_field: str = "messages" # GPT-OSS uses messages format
70
+ target_field: str = None # Not used for messages format
71
+ filter_bad_entries: bool = False
72
+ bad_entry_field: str = "bad_entry"
73
+
74
+ # Chat template configuration - GPT-OSS specific
75
+ use_chat_template: bool = True
76
+ chat_template_kwargs: dict = None
77
+
78
+ # Trackio monitoring configuration
79
+ enable_tracking: bool = True
80
+ trackio_url: Optional[str] = None
81
+ trackio_token: Optional[str] = None
82
+ log_artifacts: bool = True
83
+ log_metrics: bool = True
84
+ log_config: bool = True
85
+ experiment_name: Optional[str] = None
86
+
87
+ # HF Datasets configuration
88
+ hf_token: Optional[str] = None
89
+ dataset_repo: Optional[str] = None
90
+
91
+ # GPT-OSS specific configurations
92
+ # LoRA configuration for GPT-OSS
93
+ use_lora: bool = True
94
+ lora_config: dict = None
95
+
96
+ # Quantization for GPT-OSS (MXFP4)
97
+ use_quantization: bool = True
98
+ quantization_config: dict = None
99
+
100
+ # GPT-OSS specific model kwargs
101
+ model_kwargs: dict = None
102
+
103
+ def __post_init__(self):
104
+ if self.chat_template_kwargs is None:
105
+ self.chat_template_kwargs = {
106
+ "add_generation_prompt": True,
107
+ "tokenize": False # GPT-OSS specific
108
+ }
109
+
110
+ if self.lr_scheduler_kwargs is None:
111
+ self.lr_scheduler_kwargs = {
112
+ "min_lr_rate": 0.1
113
+ }
114
+
115
+ if self.lora_config is None:
116
+ self.lora_config = {
117
+ "r": 8,
118
+ "lora_alpha": 16,
119
+ "target_modules": "all-linear",
120
+ "target_parameters": [
121
+ "7.mlp.experts.gate_up_proj",
122
+ "7.mlp.experts.down_proj",
123
+ "15.mlp.experts.gate_up_proj",
124
+ "15.mlp.experts.down_proj",
125
+ "23.mlp.experts.gate_up_proj",
126
+ "23.mlp.experts.down_proj",
127
+ ]
128
+ }
129
+
130
+ if self.quantization_config is None:
131
+ self.quantization_config = {
132
+ "dequantize": True
133
+ }
134
+
135
+ if self.model_kwargs is None:
136
+ self.model_kwargs = {
137
+ "attn_implementation": "eager",
138
+ "torch_dtype": "auto",
139
+ "use_cache": False,
140
+ "device_map": "auto"
141
+ }
142
+
143
+ # Validate configuration
144
+ if self.fp16 and self.bf16:
145
+ raise ValueError("Cannot use both fp16 and bf16")
146
+
147
+ if self.max_seq_length > 131072: # 128k limit
148
+ raise ValueError("max_seq_length cannot exceed 131072")
149
+
150
+ # Set default experiment name if not provided
151
+ if self.experiment_name is None:
152
+ self.experiment_name = "gpt_oss_basic"
153
+
154
+ def get_config(config_path: str) -> GPTOSSBasicConfig:
155
+ """Load configuration from file or return default"""
156
+ if os.path.exists(config_path):
157
+ # Load from file if it exists
158
+ import importlib.util
159
+ spec = importlib.util.spec_from_file_location("config_module", config_path)
160
+ config_module = importlib.util.module_from_spec(spec)
161
+ spec.loader.exec_module(config_module)
162
+
163
+ if hasattr(config_module, 'config'):
164
+ return config_module.config
165
+ else:
166
+ # Try to find a config class
167
+ for attr_name in dir(config_module):
168
+ attr = getattr(config_module, attr_name)
169
+ if isinstance(attr, GPTOSSBasicConfig):
170
+ return attr
171
+
172
+ # Return default configuration
173
+ return GPTOSSBasicConfig()
174
+
175
+ # Default configuration instance
176
+ config = GPTOSSBasicConfig()
config/train_gpt_oss_h100_optimized.py ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GPT-OSS H100 Optimized Training Configuration
3
+ Based on OpenAI's GPT-OSS fine-tuning tutorial
4
+ Optimized for H100 GPU with maximum performance
5
+ """
6
+
7
+ import os
8
+ from dataclasses import dataclass
9
+ from typing import Optional
10
+
11
+ @dataclass
12
+ class GPTOSSH100OptimizedConfig:
13
+ """H100-optimized configuration for GPT-OSS fine-tuning"""
14
+
15
+ # Trainer type selection
16
+ trainer_type: str = "sft" # "sft" or "dpo"
17
+
18
+ # Model configuration - GPT-OSS specific with H100 optimizations
19
+ model_name: str = "openai/gpt-oss-20b"
20
+ max_seq_length: int = 4096 # Increased for H100
21
+ use_flash_attention: bool = True
22
+ use_gradient_checkpointing: bool = True
23
+
24
+ # Training configuration - H100 optimized
25
+ batch_size: int = 8 # Larger batch size for H100
26
+ gradient_accumulation_steps: int = 2 # Reduced for faster updates
27
+ learning_rate: float = 3e-4 # Higher LR for H100
28
+ weight_decay: float = 0.01
29
+ warmup_steps: int = 50 # Reduced warmup for rapid training
30
+ max_iters: int = 2000 # More iterations for H100
31
+ eval_interval: int = 50 # More frequent evaluation
32
+ log_interval: int = 5 # More frequent logging
33
+ save_interval: int = 200 # More frequent saving
34
+
35
+ # Optimizer configuration - H100 optimized
36
+ optimizer: str = "adamw_torch"
37
+ beta1: float = 0.9
38
+ beta2: float = 0.95
39
+ eps: float = 1e-8
40
+
41
+ # Scheduler configuration - faster learning
42
+ scheduler: str = "cosine_with_min_lr"
43
+ min_lr: float = 3e-5 # Higher min LR for H100
44
+ lr_scheduler_kwargs: dict = None
45
+
46
+ # Mixed precision - H100 optimized
47
+ fp16: bool = False # Use bf16 for H100
48
+ bf16: bool = True
49
+
50
+ # DDP configuration
51
+ ddp_backend: str = "nccl"
52
+ ddp_find_unused_parameters: bool = False
53
+
54
+ # Logging and saving - optimized for rapid training
55
+ save_steps: int = 200
56
+ eval_steps: int = 50
57
+ logging_steps: int = 5
58
+ save_total_limit: Optional[int] = 2 # Keep fewer checkpoints
59
+
60
+ # Evaluation
61
+ eval_strategy: str = "steps"
62
+ metric_for_best_model: str = "eval_loss"
63
+ greater_is_better: bool = False
64
+ load_best_model_at_end: bool = True
65
+
66
+ # Data configuration
67
+ dataset_name: str = "HuggingFaceH4/Multilingual-Thinking"
68
+ dataset_split: str = "train"
69
+ input_field: str = "messages" # GPT-OSS uses messages format
70
+ target_field: str = None # Not used for messages format
71
+ filter_bad_entries: bool = False
72
+ bad_entry_field: str = "bad_entry"
73
+
74
+ # Chat template configuration - GPT-OSS specific
75
+ use_chat_template: bool = True
76
+ chat_template_kwargs: dict = None
77
+
78
+ # Trackio monitoring configuration
79
+ enable_tracking: bool = True
80
+ trackio_url: Optional[str] = None
81
+ trackio_token: Optional[str] = None
82
+ log_artifacts: bool = True
83
+ log_metrics: bool = True
84
+ log_config: bool = True
85
+ experiment_name: Optional[str] = None
86
+
87
+ # HF Datasets configuration
88
+ hf_token: Optional[str] = None
89
+ dataset_repo: Optional[str] = None
90
+
91
+ # GPT-OSS specific configurations
92
+ # LoRA configuration for GPT-OSS - H100 optimized
93
+ use_lora: bool = True
94
+ lora_config: dict = None
95
+
96
+ # Quantization for GPT-OSS (MXFP4) - H100 optimized
97
+ use_quantization: bool = True
98
+ quantization_config: dict = None
99
+
100
+ # GPT-OSS specific model kwargs - H100 optimized
101
+ model_kwargs: dict = None
102
+
103
+ # H100-specific optimizations
104
+ dataloader_num_workers: int = 8 # More workers for H100
105
+ dataloader_pin_memory: bool = True
106
+ dataloader_prefetch_factor: int = 4 # Increased prefetch
107
+
108
+ # Memory optimizations for H100
109
+ max_grad_norm: float = 1.0
110
+ group_by_length: bool = True # Group similar length sequences
111
+
112
+ def __post_init__(self):
113
+ if self.chat_template_kwargs is None:
114
+ self.chat_template_kwargs = {
115
+ "add_generation_prompt": True,
116
+ "tokenize": False # GPT-OSS specific
117
+ }
118
+
119
+ if self.lr_scheduler_kwargs is None:
120
+ self.lr_scheduler_kwargs = {
121
+ "min_lr_rate": 0.1
122
+ }
123
+
124
+ if self.lora_config is None:
125
+ self.lora_config = {
126
+ "r": 16, # Increased for H100
127
+ "lora_alpha": 32, # Increased for H100
128
+ "target_modules": "all-linear",
129
+ "target_parameters": [
130
+ "7.mlp.experts.gate_up_proj",
131
+ "7.mlp.experts.down_proj",
132
+ "15.mlp.experts.gate_up_proj",
133
+ "15.mlp.experts.down_proj",
134
+ "23.mlp.experts.gate_up_proj",
135
+ "23.mlp.experts.down_proj",
136
+ ]
137
+ }
138
+
139
+ if self.quantization_config is None:
140
+ self.quantization_config = {
141
+ "dequantize": True
142
+ }
143
+
144
+ if self.model_kwargs is None:
145
+ self.model_kwargs = {
146
+ "attn_implementation": "eager",
147
+ "torch_dtype": "auto",
148
+ "use_cache": False,
149
+ "device_map": "auto"
150
+ }
151
+
152
+ # Validate configuration
153
+ if self.fp16 and self.bf16:
154
+ raise ValueError("Cannot use both fp16 and bf16")
155
+
156
+ if self.max_seq_length > 131072: # 128k limit
157
+ raise ValueError("max_seq_length cannot exceed 131072")
158
+
159
+ # Calculate training statistics for H100
160
+ effective_batch_size = self.batch_size * self.gradient_accumulation_steps
161
+ steps_per_epoch = 1000 // effective_batch_size # Approximate for Multilingual-Thinking
162
+ epochs_for_max_iters = self.max_iters / steps_per_epoch
163
+
164
+ print(f"=== GPT-OSS H100 Optimized Configuration ===")
165
+ print(f"Effective batch size: {effective_batch_size}")
166
+ print(f"Steps per epoch: ~{steps_per_epoch}")
167
+ print(f"Training for ~{epochs_for_max_iters:.1f} epochs")
168
+ print(f"Total training steps: {self.max_iters}")
169
+ print(f"Learning rate: {self.learning_rate}")
170
+ print(f"Mixed precision: {'bf16' if self.bf16 else 'fp16'}")
171
+ print(f"Max sequence length: {self.max_seq_length}")
172
+ print(f"Gradient checkpointing: {self.use_gradient_checkpointing}")
173
+ print(f"LoRA rank: {self.lora_config['r']}")
174
+ print(f"Data loader workers: {self.dataloader_num_workers}")
175
+ print("=" * 50)
176
+
177
+ # Set default experiment name if not provided
178
+ if self.experiment_name is None:
179
+ self.experiment_name = "gpt_oss_h100_optimized"
180
+
181
+ def get_config(config_path: str) -> GPTOSSH100OptimizedConfig:
182
+ """Load configuration from file or return default"""
183
+ if os.path.exists(config_path):
184
+ # Load from file if it exists
185
+ import importlib.util
186
+ spec = importlib.util.spec_from_file_location("config_module", config_path)
187
+ config_module = importlib.util.module_from_spec(spec)
188
+ spec.loader.exec_module(config_module)
189
+
190
+ if hasattr(config_module, 'config'):
191
+ return config_module.config
192
+ else:
193
+ # Try to find a config class
194
+ for attr_name in dir(config_module):
195
+ attr = getattr(config_module, attr_name)
196
+ if isinstance(attr, GPTOSSH100OptimizedConfig):
197
+ return attr
198
+
199
+ # Return default configuration
200
+ return GPTOSSH100OptimizedConfig()
201
+
202
+ # Default configuration instance
203
+ config = GPTOSSH100OptimizedConfig()
config/train_gpt_oss_multilingual_reasoning.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GPT-OSS Multilingual Reasoning Training Configuration
3
+ Based on OpenAI's GPT-OSS fine-tuning tutorial
4
+ Specialized for multilingual reasoning tasks
5
+ """
6
+
7
+ import os
8
+ from dataclasses import dataclass
9
+ from typing import Optional
10
+
11
+ @dataclass
12
+ class GPTOSSMultilingualReasoningConfig:
13
+ """Multilingual reasoning configuration for GPT-OSS fine-tuning"""
14
+
15
+ # Trainer type selection
16
+ trainer_type: str = "sft" # "sft" or "dpo"
17
+
18
+ # Model configuration - GPT-OSS specific for multilingual reasoning
19
+ model_name: str = "openai/gpt-oss-20b"
20
+ max_seq_length: int = 2048 # Standard for reasoning tasks
21
+ use_flash_attention: bool = True
22
+ use_gradient_checkpointing: bool = True
23
+
24
+ # Training configuration - optimized for multilingual reasoning
25
+ batch_size: int = 4 # Conservative for reasoning tasks
26
+ gradient_accumulation_steps: int = 4
27
+ learning_rate: float = 2e-4 # As per tutorial
28
+ weight_decay: float = 0.01
29
+ warmup_steps: int = 100
30
+ max_iters: int = 1000 # 1 epoch on Multilingual-Thinking
31
+ eval_interval: int = 100
32
+ log_interval: int = 10
33
+ save_interval: int = 500
34
+
35
+ # Optimizer configuration
36
+ optimizer: str = "adamw_torch"
37
+ beta1: float = 0.9
38
+ beta2: float = 0.95
39
+ eps: float = 1e-8
40
+
41
+ # Scheduler configuration - as per tutorial
42
+ scheduler: str = "cosine_with_min_lr"
43
+ min_lr: float = 2e-5 # As per tutorial
44
+ lr_scheduler_kwargs: dict = None
45
+
46
+ # Mixed precision - GPT-OSS optimized
47
+ fp16: bool = False # Use bf16 for GPT-OSS
48
+ bf16: bool = True
49
+
50
+ # DDP configuration
51
+ ddp_backend: str = "nccl"
52
+ ddp_find_unused_parameters: bool = False
53
+
54
+ # Logging and saving
55
+ save_steps: int = 500
56
+ eval_steps: int = 100
57
+ logging_steps: int = 10
58
+ save_total_limit: Optional[int] = 3
59
+
60
+ # Evaluation
61
+ eval_strategy: str = "steps"
62
+ metric_for_best_model: str = "eval_loss"
63
+ greater_is_better: bool = False
64
+ load_best_model_at_end: bool = True
65
+
66
+ # Data configuration - Multilingual-Thinking specific
67
+ dataset_name: str = "HuggingFaceH4/Multilingual-Thinking"
68
+ dataset_split: str = "train"
69
+ input_field: str = "messages" # GPT-OSS uses messages format
70
+ target_field: str = None # Not used for messages format
71
+ filter_bad_entries: bool = False
72
+ bad_entry_field: str = "bad_entry"
73
+
74
+ # Chat template configuration - GPT-OSS specific
75
+ use_chat_template: bool = True
76
+ chat_template_kwargs: dict = None
77
+
78
+ # Trackio monitoring configuration
79
+ enable_tracking: bool = True
80
+ trackio_url: Optional[str] = None
81
+ trackio_token: Optional[str] = None
82
+ log_artifacts: bool = True
83
+ log_metrics: bool = True
84
+ log_config: bool = True
85
+ experiment_name: Optional[str] = None
86
+
87
+ # HF Datasets configuration
88
+ hf_token: Optional[str] = None
89
+ dataset_repo: Optional[str] = None
90
+
91
+ # GPT-OSS specific configurations
92
+ # LoRA configuration for GPT-OSS - as per tutorial
93
+ use_lora: bool = True
94
+ lora_config: dict = None
95
+
96
+ # Quantization for GPT-OSS (MXFP4) - as per tutorial
97
+ use_quantization: bool = True
98
+ quantization_config: dict = None
99
+
100
+ # GPT-OSS specific model kwargs - as per tutorial
101
+ model_kwargs: dict = None
102
+
103
+ # Multilingual reasoning specific configurations
104
+ # Generation parameters for multilingual reasoning
105
+ generation_config: dict = None
106
+
107
+ # Multilingual reasoning evaluation languages
108
+ reasoning_languages: list = None
109
+
110
+ def __post_init__(self):
111
+ if self.chat_template_kwargs is None:
112
+ self.chat_template_kwargs = {
113
+ "add_generation_prompt": True,
114
+ "tokenize": False # GPT-OSS specific
115
+ }
116
+
117
+ if self.lr_scheduler_kwargs is None:
118
+ self.lr_scheduler_kwargs = {
119
+ "min_lr_rate": 0.1
120
+ }
121
+
122
+ if self.lora_config is None:
123
+ self.lora_config = {
124
+ "r": 8,
125
+ "lora_alpha": 16,
126
+ "target_modules": "all-linear",
127
+ "target_parameters": [
128
+ "7.mlp.experts.gate_up_proj",
129
+ "7.mlp.experts.down_proj",
130
+ "15.mlp.experts.gate_up_proj",
131
+ "15.mlp.experts.down_proj",
132
+ "23.mlp.experts.gate_up_proj",
133
+ "23.mlp.experts.down_proj",
134
+ ]
135
+ }
136
+
137
+ if self.quantization_config is None:
138
+ self.quantization_config = {
139
+ "dequantize": True
140
+ }
141
+
142
+ if self.model_kwargs is None:
143
+ self.model_kwargs = {
144
+ "attn_implementation": "eager",
145
+ "torch_dtype": "auto",
146
+ "use_cache": False,
147
+ "device_map": "auto"
148
+ }
149
+
150
+ if self.generation_config is None:
151
+ self.generation_config = {
152
+ "max_new_tokens": 512,
153
+ "do_sample": True,
154
+ "temperature": 0.6,
155
+ "top_p": None,
156
+ "top_k": None
157
+ }
158
+
159
+ if self.reasoning_languages is None:
160
+ self.reasoning_languages = [
161
+ "English", "Spanish", "French", "Italian", "German",
162
+ "Chinese", "Hindi", "Japanese", "Korean", "Arabic"
163
+ ]
164
+
165
+ # Validate configuration
166
+ if self.fp16 and self.bf16:
167
+ raise ValueError("Cannot use both fp16 and bf16")
168
+
169
+ if self.max_seq_length > 131072: # 128k limit
170
+ raise ValueError("max_seq_length cannot exceed 131072")
171
+
172
+ # Calculate training statistics for Multilingual-Thinking
173
+ effective_batch_size = self.batch_size * self.gradient_accumulation_steps
174
+ steps_per_epoch = 1000 // effective_batch_size # Multilingual-Thinking has 1000 examples
175
+ epochs_for_max_iters = self.max_iters / steps_per_epoch
176
+
177
+ print(f"=== GPT-OSS Multilingual Reasoning Configuration ===")
178
+ print(f"Dataset: {self.dataset_name}")
179
+ print(f"Effective batch size: {effective_batch_size}")
180
+ print(f"Steps per epoch: ~{steps_per_epoch}")
181
+ print(f"Training for ~{epochs_for_max_iters:.1f} epochs")
182
+ print(f"Total training steps: {self.max_iters}")
183
+ print(f"Learning rate: {self.learning_rate}")
184
+ print(f"Mixed precision: {'bf16' if self.bf16 else 'fp16'}")
185
+ print(f"Max sequence length: {self.max_seq_length}")
186
+ print(f"Gradient checkpointing: {self.use_gradient_checkpointing}")
187
+ print(f"LoRA rank: {self.lora_config['r']}")
188
+ print(f"Supported reasoning languages: {len(self.reasoning_languages)}")
189
+ print("=" * 50)
190
+
191
+ # Set default experiment name if not provided
192
+ if self.experiment_name is None:
193
+ self.experiment_name = "gpt_oss_multilingual_reasoning"
194
+
195
+ def get_config(config_path: str) -> GPTOSSMultilingualReasoningConfig:
196
+ """Load configuration from file or return default"""
197
+ if os.path.exists(config_path):
198
+ # Load from file if it exists
199
+ import importlib.util
200
+ spec = importlib.util.spec_from_file_location("config_module", config_path)
201
+ config_module = importlib.util.module_from_spec(spec)
202
+ spec.loader.exec_module(config_module)
203
+
204
+ if hasattr(config_module, 'config'):
205
+ return config_module.config
206
+ else:
207
+ # Try to find a config class
208
+ for attr_name in dir(config_module):
209
+ attr = getattr(config_module, attr_name)
210
+ if isinstance(attr, GPTOSSMultilingualReasoningConfig):
211
+ return attr
212
+
213
+ # Return default configuration
214
+ return GPTOSSMultilingualReasoningConfig()
215
+
216
+ # Default configuration instance
217
+ config = GPTOSSMultilingualReasoningConfig()
launch.sh CHANGED
@@ -164,6 +164,7 @@ show_training_configs() {
164
  print_header "Available Training Configurations"
165
  echo "======================================"
166
  echo ""
 
167
  echo "1. Basic Training (Default)"
168
  echo " - Model: SmolLM3-3B"
169
  echo " - Dataset: SmolTalk"
@@ -196,7 +197,35 @@ show_training_configs() {
196
  echo " - Learning Rate: 3e-6"
197
  echo " - Sequence Length: 8192"
198
  echo ""
199
- echo "5. Custom Configuration"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
  echo " - User-defined parameters"
201
  echo ""
202
  }
@@ -247,6 +276,36 @@ get_training_config() {
247
  MAX_SEQ_LENGTH=8192
248
  CONFIG_FILE="config/train_smollm3_openhermes_fr_a100_multiple_passes.py"
249
  ;;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
250
  "Custom Configuration")
251
  get_custom_config
252
  ;;
@@ -419,7 +478,7 @@ print_step "Step 2: Training Configuration"
419
  echo "=================================="
420
 
421
  show_training_configs
422
- select_option "Select training configuration:" "Basic Training" "H100 Lightweight (Rapid)" "A100 Large Scale" "Multiple Passes" "Custom Configuration" TRAINING_CONFIG_TYPE
423
 
424
  get_training_config "$TRAINING_CONFIG_TYPE"
425
 
@@ -783,13 +842,24 @@ export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
783
  export HF_USERNAME="$HF_USERNAME"
784
  export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
785
 
786
- # Run the simpler training script
787
- python scripts/training/train.py \
788
- --config "$CONFIG_FILE" \
789
- --experiment-name "$EXPERIMENT_NAME" \
790
- --output-dir /output-checkpoint \
791
- --trackio-url "$TRACKIO_URL" \
792
- --trainer-type "$TRAINER_TYPE_LOWER"
 
 
 
 
 
 
 
 
 
 
 
793
 
794
  # Step 16: Push model to Hugging Face Hub
795
  print_step "Step 16: Pushing Model to HF Hub"
@@ -806,14 +876,26 @@ export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
806
  export HF_USERNAME="$HF_USERNAME"
807
  export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
808
 
809
- # Run the push script
810
- python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
811
- --token "$HF_TOKEN" \
812
- --trackio-url "$TRACKIO_URL" \
813
- --experiment-name "$EXPERIMENT_NAME" \
814
- --dataset-repo "$TRACKIO_DATASET_REPO" \
815
- --author-name "$AUTHOR_NAME" \
816
- --model-description "$MODEL_DESCRIPTION"
 
 
 
 
 
 
 
 
 
 
 
 
817
 
818
  # Step 16.5: Switch Trackio Space to Read Token (Security)
819
  print_step "Step 16.5: Switching to Read Token for Security"
 
164
  print_header "Available Training Configurations"
165
  echo "======================================"
166
  echo ""
167
+ echo "=== SmolLM3 Configurations ==="
168
  echo "1. Basic Training (Default)"
169
  echo " - Model: SmolLM3-3B"
170
  echo " - Dataset: SmolTalk"
 
197
  echo " - Learning Rate: 3e-6"
198
  echo " - Sequence Length: 8192"
199
  echo ""
200
+ echo "=== GPT-OSS Configurations ==="
201
+ echo "5. GPT-OSS Basic Training"
202
+ echo " - Model: openai/gpt-oss-20b"
203
+ echo " - Dataset: Multilingual-Thinking"
204
+ echo " - Epochs: 1"
205
+ echo " - Batch Size: 4"
206
+ echo " - Learning Rate: 2e-4"
207
+ echo " - LoRA + MXFP4 Quantization"
208
+ echo " - Optimized for multilingual reasoning"
209
+ echo ""
210
+ echo "6. GPT-OSS H100 Optimized"
211
+ echo " - Model: openai/gpt-oss-20b"
212
+ echo " - Dataset: Multilingual-Thinking"
213
+ echo " - Epochs: 2"
214
+ echo " - Batch Size: 8"
215
+ echo " - Learning Rate: 3e-4"
216
+ echo " - Enhanced LoRA (rank 16)"
217
+ echo " - Optimized for H100 performance"
218
+ echo ""
219
+ echo "7. GPT-OSS Multilingual Reasoning"
220
+ echo " - Model: openai/gpt-oss-20b"
221
+ echo " - Dataset: Multilingual-Thinking"
222
+ echo " - Epochs: 1"
223
+ echo " - Batch Size: 4"
224
+ echo " - Learning Rate: 2e-4"
225
+ echo " - Specialized for reasoning tasks"
226
+ echo " - Supports 10+ languages"
227
+ echo ""
228
+ echo "8. Custom Configuration"
229
  echo " - User-defined parameters"
230
  echo ""
231
  }
 
276
  MAX_SEQ_LENGTH=8192
277
  CONFIG_FILE="config/train_smollm3_openhermes_fr_a100_multiple_passes.py"
278
  ;;
279
+ "GPT-OSS Basic Training")
280
+ MODEL_NAME="openai/gpt-oss-20b"
281
+ DATASET_NAME="HuggingFaceH4/Multilingual-Thinking"
282
+ MAX_EPOCHS=1
283
+ BATCH_SIZE=4
284
+ GRADIENT_ACCUMULATION_STEPS=4
285
+ LEARNING_RATE=2e-4
286
+ MAX_SEQ_LENGTH=2048
287
+ CONFIG_FILE="config/train_gpt_oss_basic.py"
288
+ ;;
289
+ "GPT-OSS H100 Optimized")
290
+ MODEL_NAME="openai/gpt-oss-20b"
291
+ DATASET_NAME="HuggingFaceH4/Multilingual-Thinking"
292
+ MAX_EPOCHS=2
293
+ BATCH_SIZE=8
294
+ GRADIENT_ACCUMULATION_STEPS=2
295
+ LEARNING_RATE=3e-4
296
+ MAX_SEQ_LENGTH=4096
297
+ CONFIG_FILE="config/train_gpt_oss_h100_optimized.py"
298
+ ;;
299
+ "GPT-OSS Multilingual Reasoning")
300
+ MODEL_NAME="openai/gpt-oss-20b"
301
+ DATASET_NAME="HuggingFaceH4/Multilingual-Thinking"
302
+ MAX_EPOCHS=1
303
+ BATCH_SIZE=4
304
+ GRADIENT_ACCUMULATION_STEPS=4
305
+ LEARNING_RATE=2e-4
306
+ MAX_SEQ_LENGTH=2048
307
+ CONFIG_FILE="config/train_gpt_oss_multilingual_reasoning.py"
308
+ ;;
309
  "Custom Configuration")
310
  get_custom_config
311
  ;;
 
478
  echo "=================================="
479
 
480
  show_training_configs
481
+ select_option "Select training configuration:" "Basic Training" "H100 Lightweight (Rapid)" "A100 Large Scale" "Multiple Passes" "GPT-OSS Basic Training" "GPT-OSS H100 Optimized" "GPT-OSS Multilingual Reasoning" "Custom Configuration" TRAINING_CONFIG_TYPE
482
 
483
  get_training_config "$TRAINING_CONFIG_TYPE"
484
 
 
842
  export HF_USERNAME="$HF_USERNAME"
843
  export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
844
 
845
+ # Run the appropriate training script based on model type
846
+ if [[ "$MODEL_NAME" == *"gpt-oss"* ]]; then
847
+ print_info "Using GPT-OSS specialized training script..."
848
+ python scripts/training/train_gpt_oss.py \
849
+ --config "$CONFIG_FILE" \
850
+ --experiment-name "$EXPERIMENT_NAME" \
851
+ --output-dir /output-checkpoint \
852
+ --trackio-url "$TRACKIO_URL" \
853
+ --trainer-type "$TRAINER_TYPE_LOWER"
854
+ else
855
+ print_info "Using standard SmolLM3 training script..."
856
+ python scripts/training/train.py \
857
+ --config "$CONFIG_FILE" \
858
+ --experiment-name "$EXPERIMENT_NAME" \
859
+ --output-dir /output-checkpoint \
860
+ --trackio-url "$TRACKIO_URL" \
861
+ --trainer-type "$TRAINER_TYPE_LOWER"
862
+ fi
863
 
864
  # Step 16: Push model to Hugging Face Hub
865
  print_step "Step 16: Pushing Model to HF Hub"
 
876
  export HF_USERNAME="$HF_USERNAME"
877
  export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
878
 
879
+ # Run the appropriate push script based on model type
880
+ if [[ "$MODEL_NAME" == *"gpt-oss"* ]]; then
881
+ print_info "Using GPT-OSS specialized push script..."
882
+ python scripts/model_tonic/push_gpt_oss_to_huggingface.py /output-checkpoint "$REPO_NAME" \
883
+ --token "$HF_TOKEN" \
884
+ --trackio-url "$TRACKIO_URL" \
885
+ --experiment-name "$EXPERIMENT_NAME" \
886
+ --dataset-repo "$TRACKIO_DATASET_REPO" \
887
+ --author-name "$AUTHOR_NAME" \
888
+ --model-description "$MODEL_DESCRIPTION"
889
+ else
890
+ print_info "Using standard SmolLM3 push script..."
891
+ python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
892
+ --token "$HF_TOKEN" \
893
+ --trackio-url "$TRACKIO_URL" \
894
+ --experiment-name "$EXPERIMENT_NAME" \
895
+ --dataset-repo "$TRACKIO_DATASET_REPO" \
896
+ --author-name "$AUTHOR_NAME" \
897
+ --model-description "$MODEL_DESCRIPTION"
898
+ fi
899
 
900
  # Step 16.5: Switch Trackio Space to Read Token (Security)
901
  print_step "Step 16.5: Switching to Read Token for Security"
requirements/requirements_core.txt CHANGED
@@ -1,10 +1,10 @@
1
- # Core dependencies for SmolLM3 fine-tuning
2
  torch>=2.0.0
3
- transformers>=4.53.0
4
  datasets>=2.14.0
5
  accelerate>=0.20.0
6
- peft>=0.4.0
7
- trl>=0.7.0
8
 
9
  # Hugging Face Hub for model and space management
10
  huggingface_hub>=0.19.0
@@ -16,4 +16,8 @@ pandas>=2.0.0
16
  plotly>=5.0.0
17
  trackio>=0.1.0
18
  psutil>=5.9.0
19
- pynvml>=12.0.0
 
 
 
 
 
1
+ # Core dependencies for SmolLM3 and GPT-OSS fine-tuning
2
  torch>=2.0.0
3
+ transformers>=4.55.0 # Updated for GPT-OSS compatibility
4
  datasets>=2.14.0
5
  accelerate>=0.20.0
6
+ peft>=0.17.0 # Updated for GPT-OSS LoRA support
7
+ trl>=0.20.0 # Updated for GPT-OSS compatibility
8
 
9
  # Hugging Face Hub for model and space management
10
  huggingface_hub>=0.19.0
 
16
  plotly>=5.0.0
17
  trackio>=0.1.0
18
  psutil>=5.9.0
19
+ pynvml>=12.0.0
20
+
21
+ # GPT-OSS specific dependencies
22
+ # Note: GPT-OSS requires specific versions for optimal performance
23
+ # These are compatible with the tutorial requirements
scripts/model_tonic/push_gpt_oss_to_huggingface.py ADDED
@@ -0,0 +1,317 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ GPT-OSS Model Push Script
4
+ Specialized script for pushing GPT-OSS models to Hugging Face Hub
5
+ Handles LoRA weight merging and model card generation
6
+ """
7
+
8
+ import os
9
+ import sys
10
+ import argparse
11
+ import json
12
+ from datetime import datetime
13
+ from transformers import AutoTokenizer, AutoModelForCausalLM
14
+ from peft import PeftModel
15
+ import torch
16
+
17
+ def merge_lora_weights(checkpoint_path, base_model_name, output_path):
18
+ """Merge LoRA weights with base model for inference"""
19
+
20
+ print(f"Loading base model: {base_model_name}")
21
+
22
+ # Load base model
23
+ model_kwargs = {
24
+ "attn_implementation": "eager",
25
+ "torch_dtype": "auto",
26
+ "use_cache": True,
27
+ "device_map": "auto"
28
+ }
29
+ base_model = AutoModelForCausalLM.from_pretrained(base_model_name, **model_kwargs).cuda()
30
+
31
+ print(f"Loading LoRA weights from: {checkpoint_path}")
32
+
33
+ # Load and merge LoRA weights
34
+ model = PeftModel.from_pretrained(base_model, checkpoint_path)
35
+ model = model.merge_and_unload()
36
+
37
+ print(f"Saving merged model to: {output_path}")
38
+ model.save_pretrained(output_path)
39
+
40
+ # Save tokenizer
41
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
42
+ tokenizer.save_pretrained(output_path)
43
+
44
+ return model, tokenizer
45
+
46
+ def create_gpt_oss_model_card(model_name, experiment_name, trackio_url, dataset_repo, author_name, model_description):
47
+ """Create a comprehensive model card for GPT-OSS models"""
48
+
49
+ card_content = f"""---
50
+ language:
51
+ - en
52
+ - es
53
+ - fr
54
+ - it
55
+ - de
56
+ - zh
57
+ - hi
58
+ - ja
59
+ - ko
60
+ - ar
61
+ license: mit
62
+ tags:
63
+ - gpt-oss
64
+ - multilingual
65
+ - reasoning
66
+ - chain-of-thought
67
+ - fine-tuned
68
+ ---
69
+
70
+ # {model_name}
71
+
72
+ ## Model Description
73
+
74
+ {model_description}
75
+
76
+ This model is a fine-tuned version of OpenAI's GPT-OSS-20B model, optimized for multilingual reasoning tasks. It has been trained on the Multilingual-Thinking dataset to generate chain-of-thought reasoning in multiple languages.
77
+
78
+ ## Training Details
79
+
80
+ - **Base Model**: openai/gpt-oss-20b
81
+ - **Training Dataset**: HuggingFaceH4/Multilingual-Thinking
82
+ - **Training Method**: LoRA (Low-Rank Adaptation)
83
+ - **Quantization**: MXFP4
84
+ - **Experiment**: {experiment_name}
85
+ - **Monitoring**: {trackio_url}
86
+
87
+ ## Usage
88
+
89
+ ### Basic Usage
90
+
91
+ ```python
92
+ from transformers import AutoTokenizer, AutoModelForCausalLM
93
+
94
+ # Load model and tokenizer
95
+ tokenizer = AutoTokenizer.from_pretrained("{model_name}")
96
+ model = AutoModelForCausalLM.from_pretrained("{model_name}")
97
+
98
+ # Example: Reasoning in Spanish
99
+ messages = [
100
+ {{"role": "system", "content": "reasoning language: Spanish"}},
101
+ {{"role": "user", "content": "What is the capital of Australia?"}}
102
+ ]
103
+
104
+ input_ids = tokenizer.apply_chat_template(
105
+ messages,
106
+ add_generation_prompt=True,
107
+ return_tensors="pt"
108
+ ).to(model.device)
109
+
110
+ output_ids = model.generate(input_ids, max_new_tokens=512)
111
+ response = tokenizer.batch_decode(output_ids)[0]
112
+ print(response)
113
+ ```
114
+
115
+ ### Multilingual Reasoning
116
+
117
+ The model supports reasoning in multiple languages:
118
+
119
+ - English
120
+ - Spanish (Español)
121
+ - French (Français)
122
+ - Italian (Italiano)
123
+ - German (Deutsch)
124
+ - Chinese (中文)
125
+ - Hindi (हिन्दी)
126
+ - Japanese (日本語)
127
+ - Korean (한국어)
128
+ - Arabic (العربية)
129
+
130
+ ### System Prompt Format
131
+
132
+ To control the reasoning language, use the system prompt:
133
+
134
+ ```
135
+ reasoning language: [LANGUAGE]
136
+ ```
137
+
138
+ Example:
139
+ ```
140
+ reasoning language: German
141
+ ```
142
+
143
+ ## Training Configuration
144
+
145
+ - **LoRA Rank**: 8
146
+ - **LoRA Alpha**: 16
147
+ - **Target Modules**: all-linear
148
+ - **Learning Rate**: 2e-4
149
+ - **Batch Size**: 4
150
+ - **Sequence Length**: 2048
151
+ - **Mixed Precision**: bf16
152
+
153
+ ## Dataset Information
154
+
155
+ The model was trained on the Multilingual-Thinking dataset, which contains 1,000 examples of chain-of-thought reasoning translated into multiple languages.
156
+
157
+ ## Limitations
158
+
159
+ - The model is designed for reasoning tasks and may not perform optimally on other tasks
160
+ - Reasoning quality may vary across languages
161
+ - The model inherits limitations from the base GPT-OSS-20B model
162
+
163
+ ## Citation
164
+
165
+ If you use this model in your research, please cite:
166
+
167
+ ```bibtex
168
+ @misc{{{model_name.replace("/", "_").replace("-", "_")},
169
+ author = {{{author_name}}},
170
+ title = {{{model_name}}},
171
+ year = {{{datetime.now().year}}},
172
+ publisher = {Hugging Face},
173
+ journal = {Hugging Face repository},
174
+ howpublished = {{\\url{{https://huggingface.co/{model_name}}}}}
175
+ }}
176
+ ```
177
+
178
+ ## License
179
+
180
+ This model is licensed under the MIT License.
181
+
182
+ ## Training Resources
183
+
184
+ - **Training Dataset**: https://huggingface.co/datasets/{dataset_repo}
185
+ - **Training Monitoring**: {trackio_url}
186
+ - **Base Model**: https://huggingface.co/openai/gpt-oss-20b
187
+
188
+ ## Model Information
189
+
190
+ - **Architecture**: GPT-OSS-20B with LoRA adapters
191
+ - **Parameters**: 20B base + LoRA adapters
192
+ - **Context Length**: 2048 tokens
193
+ - **Languages**: 10+ languages supported
194
+ - **Task**: Multilingual reasoning and chain-of-thought generation
195
+ """
196
+
197
+ return card_content
198
+
199
+ def push_gpt_oss_model(checkpoint_path, repo_name, hf_token, trackio_url, experiment_name, dataset_repo, author_name, model_description):
200
+ """Push GPT-OSS model to Hugging Face Hub"""
201
+
202
+ print("=== GPT-OSS Model Push Pipeline ===")
203
+ print(f"Checkpoint: {checkpoint_path}")
204
+ print(f"Repository: {repo_name}")
205
+ print(f"Experiment: {experiment_name}")
206
+ print(f"Author: {author_name}")
207
+
208
+ # Validate checkpoint path
209
+ if not os.path.exists(checkpoint_path):
210
+ raise FileNotFoundError(f"Checkpoint path not found: {checkpoint_path}")
211
+
212
+ # Create temporary directory for merged model
213
+ temp_output = f"/tmp/gpt_oss_merged_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
214
+ os.makedirs(temp_output, exist_ok=True)
215
+
216
+ try:
217
+ # Merge LoRA weights with base model
218
+ print("Merging LoRA weights with base model...")
219
+ model, tokenizer = merge_lora_weights(
220
+ checkpoint_path=checkpoint_path,
221
+ base_model_name="openai/gpt-oss-20b",
222
+ output_path=temp_output
223
+ )
224
+
225
+ # Create model card
226
+ print("Creating model card...")
227
+ model_card_content = create_gpt_oss_model_card(
228
+ model_name=repo_name,
229
+ experiment_name=experiment_name,
230
+ trackio_url=trackio_url,
231
+ dataset_repo=dataset_repo,
232
+ author_name=author_name,
233
+ model_description=model_description
234
+ )
235
+
236
+ # Save model card
237
+ model_card_path = os.path.join(temp_output, "README.md")
238
+ with open(model_card_path, "w", encoding="utf-8") as f:
239
+ f.write(model_card_content)
240
+
241
+ # Push to Hugging Face Hub
242
+ print(f"Pushing model to: {repo_name}")
243
+
244
+ # Set HF token
245
+ os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
246
+
247
+ # Push using transformers
248
+ from huggingface_hub import HfApi
249
+ api = HfApi()
250
+
251
+ # Create repository if it doesn't exist
252
+ try:
253
+ api.create_repo(repo_name, private=False, exist_ok=True)
254
+ except Exception as e:
255
+ print(f"Warning: Could not create repository: {e}")
256
+
257
+ # Upload files
258
+ print("Uploading model files...")
259
+ api.upload_folder(
260
+ folder_path=temp_output,
261
+ repo_id=repo_name,
262
+ repo_type="model"
263
+ )
264
+
265
+ print("✅ GPT-OSS model pushed successfully!")
266
+ print(f"Model URL: https://huggingface.co/{repo_name}")
267
+
268
+ # Clean up
269
+ import shutil
270
+ shutil.rmtree(temp_output)
271
+
272
+ return True
273
+
274
+ except Exception as e:
275
+ print(f"❌ Error pushing GPT-OSS model: {e}")
276
+
277
+ # Clean up on error
278
+ if os.path.exists(temp_output):
279
+ import shutil
280
+ shutil.rmtree(temp_output)
281
+
282
+ return False
283
+
284
+ def main():
285
+ parser = argparse.ArgumentParser(description="Push GPT-OSS model to Hugging Face Hub")
286
+ parser.add_argument("checkpoint_path", help="Path to model checkpoint")
287
+ parser.add_argument("repo_name", help="Hugging Face repository name")
288
+ parser.add_argument("--token", required=True, help="Hugging Face token")
289
+ parser.add_argument("--trackio-url", help="Trackio URL for model card")
290
+ parser.add_argument("--experiment-name", help="Experiment name")
291
+ parser.add_argument("--dataset-repo", help="Dataset repository")
292
+ parser.add_argument("--author-name", help="Author name")
293
+ parser.add_argument("--model-description", help="Model description")
294
+
295
+ args = parser.parse_args()
296
+
297
+ # Set defaults
298
+ experiment_name = args.experiment_name or "gpt_oss_finetune"
299
+ dataset_repo = args.dataset_repo or "HuggingFaceH4/Multilingual-Thinking"
300
+ author_name = args.author_name or "GPT-OSS Fine-tuner"
301
+ model_description = args.model_description or "A fine-tuned version of OpenAI's GPT-OSS-20B model for multilingual reasoning tasks."
302
+
303
+ success = push_gpt_oss_model(
304
+ checkpoint_path=args.checkpoint_path,
305
+ repo_name=args.repo_name,
306
+ hf_token=args.token,
307
+ trackio_url=args.trackio_url,
308
+ experiment_name=experiment_name,
309
+ dataset_repo=dataset_repo,
310
+ author_name=author_name,
311
+ model_description=model_description
312
+ )
313
+
314
+ sys.exit(0 if success else 1)
315
+
316
+ if __name__ == "__main__":
317
+ main()
scripts/training/train_gpt_oss.py ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ GPT-OSS Training Script
4
+ Specialized training script for OpenAI's GPT-OSS models
5
+ Based on the GPT-OSS fine-tuning tutorial
6
+ """
7
+
8
+ import os
9
+ import sys
10
+ import argparse
11
+ import torch
12
+ from transformers import AutoTokenizer, AutoModelForCausalLM
13
+ from peft import LoraConfig, get_peft_model
14
+ from trl import SFTTrainer, SFTConfig
15
+ import trackio
16
+ from datasets import load_dataset
17
+
18
+ def load_gpt_oss_model_and_tokenizer(config):
19
+ """Load GPT-OSS model and tokenizer with proper configuration"""
20
+
21
+ print("Loading GPT-OSS tokenizer...")
22
+ tokenizer = AutoTokenizer.from_pretrained(config.model_name)
23
+
24
+ print("Loading GPT-OSS model with quantization...")
25
+
26
+ # Import quantization config
27
+ from transformers import Mxfp4Config
28
+
29
+ # Set up quantization config
30
+ quantization_config = Mxfp4Config(dequantize=True)
31
+
32
+ # Model kwargs as per tutorial
33
+ model_kwargs = {
34
+ "attn_implementation": "eager",
35
+ "torch_dtype": torch.bfloat16,
36
+ "quantization_config": quantization_config,
37
+ "use_cache": False,
38
+ "device_map": "auto",
39
+ }
40
+
41
+ model = AutoModelForCausalLM.from_pretrained(config.model_name, **model_kwargs)
42
+
43
+ return model, tokenizer
44
+
45
+ def setup_lora_for_gpt_oss(model, config):
46
+ """Setup LoRA for GPT-OSS model"""
47
+
48
+ print("Setting up LoRA for GPT-OSS...")
49
+
50
+ # LoRA configuration as per tutorial
51
+ lora_config = LoraConfig(
52
+ r=config.lora_config.get("r", 8),
53
+ lora_alpha=config.lora_config.get("lora_alpha", 16),
54
+ target_modules=config.lora_config.get("target_modules", "all-linear"),
55
+ target_parameters=config.lora_config.get("target_parameters", [
56
+ "7.mlp.experts.gate_up_proj",
57
+ "7.mlp.experts.down_proj",
58
+ "15.mlp.experts.gate_up_proj",
59
+ "15.mlp.experts.down_proj",
60
+ "23.mlp.experts.gate_up_proj",
61
+ "23.mlp.experts.down_proj",
62
+ ]),
63
+ )
64
+
65
+ peft_model = get_peft_model(model, lora_config)
66
+ peft_model.print_trainable_parameters()
67
+
68
+ return peft_model
69
+
70
+ def load_multilingual_thinking_dataset():
71
+ """Load the Multilingual-Thinking dataset"""
72
+
73
+ print("Loading Multilingual-Thinking dataset...")
74
+ dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
75
+ print(f"Dataset loaded: {len(dataset)} examples")
76
+
77
+ return dataset
78
+
79
+ def setup_trackio_tracking(config):
80
+ """Setup Trackio tracking if enabled"""
81
+
82
+ if not config.enable_tracking or not config.trackio_url:
83
+ print("Trackio tracking disabled or URL not provided")
84
+ return None
85
+
86
+ print(f"Setting up Trackio tracking: {config.trackio_url}")
87
+
88
+ # Initialize Trackio client
89
+ trackio_client = trackio.Client(
90
+ api_url=config.trackio_url,
91
+ token=config.trackio_token
92
+ )
93
+
94
+ return trackio_client
95
+
96
+ def create_sft_config(config):
97
+ """Create SFTConfig for GPT-OSS training"""
98
+
99
+ print("Creating SFT configuration...")
100
+
101
+ sft_config = SFTConfig(
102
+ learning_rate=config.learning_rate,
103
+ gradient_checkpointing=True,
104
+ num_train_epochs=1, # Single epoch as per tutorial
105
+ logging_steps=config.logging_steps,
106
+ per_device_train_batch_size=config.batch_size,
107
+ gradient_accumulation_steps=config.gradient_accumulation_steps,
108
+ max_length=config.max_seq_length,
109
+ warmup_ratio=0.03,
110
+ lr_scheduler_type="cosine_with_min_lr",
111
+ lr_scheduler_kwargs={"min_lr_rate": 0.1},
112
+ output_dir="gpt-oss-20b-multilingual-reasoner",
113
+ report_to="trackio" if config.enable_tracking else None,
114
+ push_to_hub=True,
115
+ )
116
+
117
+ return sft_config
118
+
119
+ def train_gpt_oss(config_path, experiment_name, output_dir, trackio_url, trainer_type="sft"):
120
+ """Main training function for GPT-OSS"""
121
+
122
+ print("=== GPT-OSS Training Pipeline ===")
123
+ print(f"Config: {config_path}")
124
+ print(f"Experiment: {experiment_name}")
125
+ print(f"Output: {output_dir}")
126
+ print(f"Trackio: {trackio_url}")
127
+ print(f"Trainer: {trainer_type}")
128
+
129
+ # Load configuration
130
+ if os.path.exists(config_path):
131
+ import importlib.util
132
+ spec = importlib.util.spec_from_file_location("config_module", config_path)
133
+ config_module = importlib.util.module_from_spec(spec)
134
+ spec.loader.exec_module(config_module)
135
+
136
+ if hasattr(config_module, 'config'):
137
+ config = config_module.config
138
+ else:
139
+ # Try to find a config class
140
+ for attr_name in dir(config_module):
141
+ attr = getattr(config_module, attr_name)
142
+ if hasattr(attr, 'model_name') and 'gpt_oss' in attr.model_name.lower():
143
+ config = attr
144
+ break
145
+ else:
146
+ raise ValueError(f"No GPT-OSS configuration found in {config_path}")
147
+ else:
148
+ raise FileNotFoundError(f"Configuration file not found: {config_path}")
149
+
150
+ # Update config with runtime parameters
151
+ config.experiment_name = experiment_name
152
+ config.trackio_url = trackio_url
153
+ config.trainer_type = trainer_type
154
+
155
+ # Load model and tokenizer
156
+ model, tokenizer = load_gpt_oss_model_and_tokenizer(config)
157
+
158
+ # Setup LoRA
159
+ peft_model = setup_lora_for_gpt_oss(model, config)
160
+
161
+ # Load dataset
162
+ dataset = load_multilingual_thinking_dataset()
163
+
164
+ # Setup Trackio tracking
165
+ trackio_client = setup_trackio_tracking(config)
166
+
167
+ # Create SFT configuration
168
+ sft_config = create_sft_config(config)
169
+
170
+ # Create trainer
171
+ print("Creating SFT trainer...")
172
+ trainer = SFTTrainer(
173
+ model=peft_model,
174
+ args=sft_config,
175
+ train_dataset=dataset,
176
+ processing_class=tokenizer,
177
+ )
178
+
179
+ # Start training
180
+ print("Starting GPT-OSS training...")
181
+ trainer.train()
182
+
183
+ # Save model
184
+ print("Saving trained model...")
185
+ trainer.save_model(output_dir)
186
+
187
+ # Push to hub if enabled
188
+ if sft_config.push_to_hub:
189
+ print("Pushing model to Hugging Face Hub...")
190
+ trainer.push_to_hub(dataset_name="HuggingFaceH4/Multilingual-Thinking")
191
+
192
+ print("GPT-OSS training completed successfully!")
193
+
194
+ return trainer
195
+
196
+ def main():
197
+ parser = argparse.ArgumentParser(description="GPT-OSS Training Script")
198
+ parser.add_argument("--config", required=True, help="Path to configuration file")
199
+ parser.add_argument("--experiment-name", required=True, help="Experiment name")
200
+ parser.add_argument("--output-dir", required=True, help="Output directory for checkpoints")
201
+ parser.add_argument("--trackio-url", help="Trackio URL for monitoring")
202
+ parser.add_argument("--trainer-type", default="sft", choices=["sft", "dpo"], help="Trainer type")
203
+
204
+ args = parser.parse_args()
205
+
206
+ # Validate arguments
207
+ if not os.path.exists(args.config):
208
+ print(f"Error: Configuration file not found: {args.config}")
209
+ sys.exit(1)
210
+
211
+ # Create output directory
212
+ os.makedirs(args.output_dir, exist_ok=True)
213
+
214
+ try:
215
+ train_gpt_oss(
216
+ config_path=args.config,
217
+ experiment_name=args.experiment_name,
218
+ output_dir=args.output_dir,
219
+ trackio_url=args.trackio_url,
220
+ trainer_type=args.trainer_type
221
+ )
222
+ except Exception as e:
223
+ print(f"Error during training: {e}")
224
+ sys.exit(1)
225
+
226
+ if __name__ == "__main__":
227
+ main()