Tonic commited on
Commit
d0d19b2
Β·
verified Β·
1 Parent(s): 08ed534

Fix model recovery and deployment scripts - add safetensors support and Windows compatibility

Browse files
MODEL_RECOVERY_GUIDE.md ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Recovery and Deployment Guide
2
+
3
+ This guide will help you recover your trained model from the cloud instance and deploy it to Hugging Face Hub with quantization.
4
+
5
+ ## Prerequisites
6
+
7
+ 1. **Hugging Face Token**: You need a Hugging Face token with write permissions
8
+ 2. **Cloud Instance Access**: SSH access to your cloud instance
9
+ 3. **Model Files**: Your trained model should be in `/output-checkpoint/` on the cloud instance
10
+
11
+ ## Step 1: Connect to Your Cloud Instance
12
+
13
+ ```bash
14
+ ssh root@your-cloud-instance-ip
15
+ cd ~/smollm3_finetune
16
+ ```
17
+
18
+ ## Step 2: Set Your Hugging Face Token
19
+
20
+ ```bash
21
+ export HF_TOKEN=your_huggingface_token_here
22
+ ```
23
+
24
+ Replace `your_huggingface_token_here` with your actual Hugging Face token.
25
+
26
+ ## Step 3: Verify Model Files
27
+
28
+ Check that your model files exist:
29
+
30
+ ```bash
31
+ ls -la /output-checkpoint/
32
+ ```
33
+
34
+ You should see files like:
35
+ - `config.json`
36
+ - `model.safetensors.index.json`
37
+ - `model-00001-of-00002.safetensors`
38
+ - `model-00002-of-00002.safetensors`
39
+ - `tokenizer.json`
40
+ - `tokenizer_config.json`
41
+
42
+ ## Step 4: Update Configuration
43
+
44
+ Edit the deployment script to use your Hugging Face username:
45
+
46
+ ```bash
47
+ nano cloud_deploy.py
48
+ ```
49
+
50
+ Change this line:
51
+ ```python
52
+ REPO_NAME = "your-username/smollm3-finetuned" # Change to your HF username and desired repo name
53
+ ```
54
+
55
+ To your actual username, for example:
56
+ ```python
57
+ REPO_NAME = "tonic/smollm3-finetuned"
58
+ ```
59
+
60
+ ## Step 5: Run the Deployment
61
+
62
+ Execute the deployment script:
63
+
64
+ ```bash
65
+ python3 cloud_deploy.py
66
+ ```
67
+
68
+ This will:
69
+ 1. βœ… Validate your model files
70
+ 2. βœ… Install required dependencies (torchao, huggingface_hub)
71
+ 3. βœ… Push the main model to Hugging Face Hub
72
+ 4. βœ… Create quantized versions (int8 and int4)
73
+ 5. βœ… Push quantized models to subdirectories
74
+
75
+ ## Step 6: Verify Deployment
76
+
77
+ After successful deployment, you can verify:
78
+
79
+ 1. **Main Model**: https://huggingface.co/your-username/smollm3-finetuned
80
+ 2. **int8 Quantized**: https://huggingface.co/your-username/smollm3-finetuned/int8
81
+ 3. **int4 Quantized**: https://huggingface.co/your-username/smollm3-finetuned/int4
82
+
83
+ ## Alternative: Manual Deployment
84
+
85
+ If you prefer to run the steps manually:
86
+
87
+ ### 1. Push Main Model Only
88
+
89
+ ```bash
90
+ python3 scripts/model_tonic/push_to_huggingface.py \
91
+ /output-checkpoint/ \
92
+ your-username/smollm3-finetuned \
93
+ --hf-token $HF_TOKEN \
94
+ --author-name "Your Name" \
95
+ --model-description "A fine-tuned SmolLM3 model for improved text generation"
96
+ ```
97
+
98
+ ### 2. Quantize and Push (Optional)
99
+
100
+ ```bash
101
+ # int8 quantization (GPU optimized)
102
+ python3 scripts/model_tonic/quantize_model.py \
103
+ /output-checkpoint/ \
104
+ your-username/smollm3-finetuned \
105
+ --quant-type int8_weight_only \
106
+ --hf-token $HF_TOKEN
107
+
108
+ # int4 quantization (CPU optimized)
109
+ python3 scripts/model_tonic/quantize_model.py \
110
+ /output-checkpoint/ \
111
+ your-username/smollm3-finetuned \
112
+ --quant-type int4_weight_only \
113
+ --hf-token $HF_TOKEN
114
+ ```
115
+
116
+ ## Troubleshooting
117
+
118
+ ### Common Issues
119
+
120
+ 1. **HF_TOKEN not set**
121
+ ```bash
122
+ export HF_TOKEN=your_token_here
123
+ ```
124
+
125
+ 2. **Model files not found**
126
+ ```bash
127
+ ls -la /output-checkpoint/
128
+ ```
129
+ Make sure the training completed successfully.
130
+
131
+ 3. **Dependencies missing**
132
+ ```bash
133
+ pip install torchao huggingface_hub
134
+ ```
135
+
136
+ 4. **Permission denied**
137
+ ```bash
138
+ chmod +x cloud_deploy.py
139
+ chmod +x recover_model.py
140
+ ```
141
+
142
+ ### Error Messages
143
+
144
+ - **"Missing required model files"**: Check that your model training completed successfully
145
+ - **"Repository creation failed"**: Verify your HF token has write permissions
146
+ - **"Quantization failed"**: Check GPU memory availability or try CPU quantization
147
+
148
+ ## Model Usage
149
+
150
+ Once deployed, you can use your model:
151
+
152
+ ```python
153
+ from transformers import AutoModelForCausalLM, AutoTokenizer
154
+
155
+ # Main model
156
+ model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned")
157
+ tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned")
158
+
159
+ # int8 quantized (GPU optimized)
160
+ model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int8")
161
+ tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int8")
162
+
163
+ # int4 quantized (CPU optimized)
164
+ model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int4")
165
+ tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int4")
166
+
167
+ # Generate text
168
+ inputs = tokenizer("Hello, how are you?", return_tensors="pt")
169
+ outputs = model.generate(**inputs, max_new_tokens=100)
170
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
171
+ ```
172
+
173
+ ## File Structure
174
+
175
+ After deployment, your repository will have:
176
+
177
+ ```
178
+ your-username/smollm3-finetuned/
179
+ β”œβ”€β”€ README.md (model card)
180
+ β”œβ”€β”€ config.json
181
+ β”œβ”€β”€ model.safetensors.index.json
182
+ β”œβ”€β”€ model-00001-of-00002.safetensors
183
+ β”œβ”€β”€ model-00002-of-00002.safetensors
184
+ β”œβ”€β”€ tokenizer.json
185
+ β”œβ”€β”€ tokenizer_config.json
186
+ β”œβ”€β”€ int8/ (quantized model for GPU)
187
+ β”‚ β”œβ”€β”€ README.md
188
+ β”‚ β”œβ”€β”€ config.json
189
+ β”‚ └── pytorch_model.bin
190
+ └── int4/ (quantized model for CPU)
191
+ β”œβ”€β”€ README.md
192
+ β”œβ”€β”€ config.json
193
+ └── pytorch_model.bin
194
+ ```
195
+
196
+ ## Success Indicators
197
+
198
+ βœ… **Successful deployment shows:**
199
+ - "Model recovery and deployment completed successfully!"
200
+ - "View your model at: https://huggingface.co/your-username/smollm3-finetuned"
201
+ - No error messages in the output
202
+
203
+ ❌ **Failed deployment shows:**
204
+ - Error messages about missing files or permissions
205
+ - "Model recovery and deployment failed!"
206
+
207
+ ## Next Steps
208
+
209
+ After successful deployment:
210
+
211
+ 1. **Test your model** on Hugging Face Hub
212
+ 2. **Share your model** with the community
213
+ 3. **Monitor usage** through Hugging Face analytics
214
+ 4. **Consider fine-tuning** further based on feedback
215
+
216
+ ## Support
217
+
218
+ If you encounter issues:
219
+
220
+ 1. Check the error messages carefully
221
+ 2. Verify your HF token permissions
222
+ 3. Ensure all model files are present
223
+ 4. Try running individual steps manually
224
+ 5. Check the logs for detailed error information
225
+
226
+ ---
227
+
228
+ **Happy deploying! πŸš€**
cloud_deploy.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Cloud Model Deployment Script
4
+ Run this directly on your cloud instance to deploy your trained model
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import logging
10
+ from pathlib import Path
11
+
12
+ # Setup logging
13
+ logging.basicConfig(
14
+ level=logging.INFO,
15
+ format='%(asctime)s - %(levelname)s - %(message)s'
16
+ )
17
+ logger = logging.getLogger(__name__)
18
+
19
+ def main():
20
+ """Main deployment function"""
21
+
22
+ # Configuration - CHANGE THESE VALUES
23
+ MODEL_PATH = "/output-checkpoint"
24
+ REPO_NAME = "your-username/smollm3-finetuned" # Change to your HF username and desired repo name
25
+ HF_TOKEN = os.getenv('HF_TOKEN')
26
+ PRIVATE = False # Set to True for private repository
27
+
28
+ # Validate configuration
29
+ if not HF_TOKEN:
30
+ logger.error("❌ HF_TOKEN environment variable not set")
31
+ logger.info("Please set your Hugging Face token:")
32
+ logger.info("export HF_TOKEN=your_token_here")
33
+ return 1
34
+
35
+ if not Path(MODEL_PATH).exists():
36
+ logger.error(f"❌ Model path not found: {MODEL_PATH}")
37
+ return 1
38
+
39
+ # Check for required files
40
+ required_files = ['config.json', 'model.safetensors.index.json', 'tokenizer.json']
41
+ for file in required_files:
42
+ if not (Path(MODEL_PATH) / file).exists():
43
+ logger.error(f"❌ Required file not found: {file}")
44
+ return 1
45
+
46
+ logger.info("βœ… Model files validated")
47
+
48
+ # Install dependencies if needed
49
+ try:
50
+ import torchao
51
+ logger.info("βœ… torchao available")
52
+ except ImportError:
53
+ logger.info("πŸ“¦ Installing torchao...")
54
+ os.system("pip install torchao")
55
+
56
+ try:
57
+ import huggingface_hub
58
+ logger.info("βœ… huggingface_hub available")
59
+ except ImportError:
60
+ logger.info("πŸ“¦ Installing huggingface_hub...")
61
+ os.system("pip install huggingface_hub")
62
+
63
+ # Run the recovery script
64
+ logger.info("πŸš€ Starting model deployment...")
65
+
66
+ cmd = [
67
+ sys.executable, "recover_model.py",
68
+ MODEL_PATH,
69
+ REPO_NAME,
70
+ "--hf-token", HF_TOKEN,
71
+ "--quant-types", "int8_weight_only", "int4_weight_only",
72
+ "--author-name", "Your Name",
73
+ "--model-description", "A fine-tuned SmolLM3 model for improved text generation and conversation capabilities"
74
+ ]
75
+
76
+ if PRIVATE:
77
+ cmd.append("--private")
78
+
79
+ logger.info(f"Running: {' '.join(cmd)}")
80
+
81
+ # Run the command
82
+ result = os.system(' '.join(cmd))
83
+
84
+ if result == 0:
85
+ logger.info("βœ… Model deployment completed successfully!")
86
+ logger.info(f"🌐 View your model at: https://huggingface.co/{REPO_NAME}")
87
+ logger.info("πŸ“Š Quantized models available at:")
88
+ logger.info(f" - https://huggingface.co/{REPO_NAME}/int8 (GPU optimized)")
89
+ logger.info(f" - https://huggingface.co/{REPO_NAME}/int4 (CPU optimized)")
90
+ return 0
91
+ else:
92
+ logger.error("❌ Model deployment failed!")
93
+ return 1
94
+
95
+ if __name__ == "__main__":
96
+ exit(main())
cloud_recovery.sh ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Cloud Model Recovery and Deployment Script
3
+ # Run this on your cloud instance to recover and deploy your trained model
4
+
5
+ set -e # Exit on any error
6
+
7
+ echo "πŸš€ Starting cloud model recovery and deployment..."
8
+
9
+ # Configuration
10
+ MODEL_PATH="/output-checkpoint"
11
+ REPO_NAME="your-username/smollm3-finetuned" # Change this to your HF username and desired repo name
12
+ HF_TOKEN="${HF_TOKEN}" # Set this environment variable
13
+ PRIVATE=false # Set to true if you want a private repository
14
+
15
+ # Colors for output
16
+ RED='\033[0;31m'
17
+ GREEN='\033[0;32m'
18
+ YELLOW='\033[1;33m'
19
+ BLUE='\033[0;34m'
20
+ NC='\033[0m' # No Color
21
+
22
+ # Function to print colored output
23
+ print_status() {
24
+ echo -e "${BLUE}[INFO]${NC} $1"
25
+ }
26
+
27
+ print_success() {
28
+ echo -e "${GREEN}[SUCCESS]${NC} $1"
29
+ }
30
+
31
+ print_warning() {
32
+ echo -e "${YELLOW}[WARNING]${NC} $1"
33
+ }
34
+
35
+ print_error() {
36
+ echo -e "${RED}[ERROR]${NC} $1"
37
+ }
38
+
39
+ # Check if we're in the right directory
40
+ if [ ! -d "$MODEL_PATH" ]; then
41
+ print_error "Model path not found: $MODEL_PATH"
42
+ exit 1
43
+ fi
44
+
45
+ print_status "Found model at: $MODEL_PATH"
46
+
47
+ # Check for required files
48
+ print_status "Validating model files..."
49
+ if [ ! -f "$MODEL_PATH/config.json" ]; then
50
+ print_error "config.json not found"
51
+ exit 1
52
+ fi
53
+
54
+ if [ ! -f "$MODEL_PATH/model.safetensors.index.json" ]; then
55
+ print_error "model.safetensors.index.json not found"
56
+ exit 1
57
+ fi
58
+
59
+ if [ ! -f "$MODEL_PATH/tokenizer.json" ]; then
60
+ print_error "tokenizer.json not found"
61
+ exit 1
62
+ fi
63
+
64
+ print_success "Model files validated"
65
+
66
+ # Check HF token
67
+ if [ -z "$HF_TOKEN" ]; then
68
+ print_error "HF_TOKEN environment variable not set"
69
+ print_status "Please set your Hugging Face token:"
70
+ print_status "export HF_TOKEN=your_token_here"
71
+ exit 1
72
+ fi
73
+
74
+ print_success "HF Token found"
75
+
76
+ # Install required packages if not already installed
77
+ print_status "Checking dependencies..."
78
+ python3 -c "import torchao" 2>/dev/null || {
79
+ print_status "Installing torchao..."
80
+ pip install torchao
81
+ }
82
+
83
+ python3 -c "import huggingface_hub" 2>/dev/null || {
84
+ print_status "Installing huggingface_hub..."
85
+ pip install huggingface_hub
86
+ }
87
+
88
+ print_success "Dependencies checked"
89
+
90
+ # Run the recovery script
91
+ print_status "Running model recovery and deployment pipeline..."
92
+
93
+ python3 recover_model.py \
94
+ "$MODEL_PATH" \
95
+ "$REPO_NAME" \
96
+ --hf-token "$HF_TOKEN" \
97
+ --private "$PRIVATE" \
98
+ --quant-types int8_weight_only int4_weight_only \
99
+ --author-name "Your Name" \
100
+ --model-description "A fine-tuned SmolLM3 model for improved text generation and conversation capabilities"
101
+
102
+ if [ $? -eq 0 ]; then
103
+ print_success "Model recovery and deployment completed successfully!"
104
+ print_success "View your model at: https://huggingface.co/$REPO_NAME"
105
+ print_success "Quantized models available at:"
106
+ print_success " - https://huggingface.co/$REPO_NAME/int8 (GPU optimized)"
107
+ print_success " - https://huggingface.co/$REPO_NAME/int4 (CPU optimized)"
108
+ else
109
+ print_error "Model recovery and deployment failed!"
110
+ exit 1
111
+ fi
112
+
113
+ print_success "πŸŽ‰ All done! Your model has been successfully recovered and deployed to Hugging Face Hub."
process_model.py ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Model Processing Script
4
+ Processes recovered model with quantization and pushing to HF Hub
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import json
10
+ import logging
11
+ import subprocess
12
+ from pathlib import Path
13
+ from typing import Dict, Any, Optional
14
+
15
+ # Setup logging
16
+ logging.basicConfig(
17
+ level=logging.INFO,
18
+ format='%(asctime)s - %(levelname)s - %(message)s'
19
+ )
20
+ logger = logging.getLogger(__name__)
21
+
22
+ class ModelProcessor:
23
+ """Process recovered model with quantization and pushing"""
24
+
25
+ def __init__(self, model_path: str = "recovered_model"):
26
+ self.model_path = Path(model_path)
27
+ self.hf_token = os.getenv('HF_TOKEN')
28
+
29
+ def validate_model(self) -> bool:
30
+ """Validate that the model can be loaded"""
31
+ try:
32
+ logger.info("πŸ” Validating model loading...")
33
+
34
+ # Try to load the model
35
+ cmd = [
36
+ sys.executable, "-c",
37
+ "from transformers import AutoModelForCausalLM; "
38
+ "model = AutoModelForCausalLM.from_pretrained('recovered_model', "
39
+ "torch_dtype='auto', device_map='auto'); "
40
+ "print('βœ… Model loaded successfully')"
41
+ ]
42
+
43
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
44
+
45
+ if result.returncode == 0:
46
+ logger.info("βœ… Model validation successful")
47
+ return True
48
+ else:
49
+ logger.error(f"❌ Model validation failed: {result.stderr}")
50
+ return False
51
+
52
+ except Exception as e:
53
+ logger.error(f"❌ Model validation error: {e}")
54
+ return False
55
+
56
+ def get_model_info(self) -> Dict[str, Any]:
57
+ """Get information about the model"""
58
+ try:
59
+ # Load config
60
+ config_path = self.model_path / "config.json"
61
+ if config_path.exists():
62
+ with open(config_path, 'r') as f:
63
+ config = json.load(f)
64
+ else:
65
+ config = {}
66
+
67
+ # Calculate model size
68
+ total_size = 0
69
+ for file in self.model_path.rglob("*"):
70
+ if file.is_file():
71
+ total_size += file.stat().st_size
72
+
73
+ model_info = {
74
+ "model_type": config.get("model_type", "smollm3"),
75
+ "architectures": config.get("architectures", ["SmolLM3ForCausalLM"]),
76
+ "model_size_gb": total_size / (1024**3),
77
+ "vocab_size": config.get("vocab_size", 32000),
78
+ "hidden_size": config.get("hidden_size", 2048),
79
+ "num_attention_heads": config.get("num_attention_heads", 16),
80
+ "num_hidden_layers": config.get("num_hidden_layers", 24),
81
+ "max_position_embeddings": config.get("max_position_embeddings", 8192)
82
+ }
83
+
84
+ logger.info(f"πŸ“Š Model info: {model_info}")
85
+ return model_info
86
+
87
+ except Exception as e:
88
+ logger.error(f"❌ Failed to get model info: {e}")
89
+ return {}
90
+
91
+ def run_quantization(self, repo_name: str, quant_type: str = "int8_weight_only") -> bool:
92
+ """Run quantization on the model"""
93
+ try:
94
+ logger.info(f"πŸ”„ Running quantization: {quant_type}")
95
+
96
+ # Check if quantization script exists
97
+ quantize_script = Path("scripts/model_tonic/quantize_model.py")
98
+ if not quantize_script.exists():
99
+ logger.error(f"❌ Quantization script not found: {quantize_script}")
100
+ return False
101
+
102
+ # Run quantization
103
+ cmd = [
104
+ sys.executable, str(quantize_script),
105
+ str(self.model_path),
106
+ repo_name,
107
+ "--quant-type", quant_type,
108
+ "--device", "auto"
109
+ ]
110
+
111
+ if self.hf_token:
112
+ cmd.extend(["--token", self.hf_token])
113
+
114
+ logger.info(f"πŸš€ Running: {' '.join(cmd)}")
115
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=1800) # 30 min timeout
116
+
117
+ if result.returncode == 0:
118
+ logger.info("βœ… Quantization completed successfully")
119
+ logger.info(result.stdout)
120
+ return True
121
+ else:
122
+ logger.error("❌ Quantization failed")
123
+ logger.error(result.stderr)
124
+ return False
125
+
126
+ except subprocess.TimeoutExpired:
127
+ logger.error("❌ Quantization timed out")
128
+ return False
129
+ except Exception as e:
130
+ logger.error(f"❌ Failed to run quantization: {e}")
131
+ return False
132
+
133
+ def run_model_push(self, repo_name: str) -> bool:
134
+ """Push the model to HF Hub"""
135
+ try:
136
+ logger.info(f"πŸ”„ Pushing model to: {repo_name}")
137
+
138
+ # Check if push script exists
139
+ push_script = Path("scripts/model_tonic/push_to_huggingface.py")
140
+ if not push_script.exists():
141
+ logger.error(f"❌ Push script not found: {push_script}")
142
+ return False
143
+
144
+ # Run push
145
+ cmd = [
146
+ sys.executable, str(push_script),
147
+ str(self.model_path),
148
+ repo_name
149
+ ]
150
+
151
+ if self.hf_token:
152
+ cmd.extend(["--token", self.hf_token])
153
+
154
+ logger.info(f"πŸš€ Running: {' '.join(cmd)}")
155
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=1800) # 30 min timeout
156
+
157
+ if result.returncode == 0:
158
+ logger.info("βœ… Model push completed successfully")
159
+ logger.info(result.stdout)
160
+ return True
161
+ else:
162
+ logger.error("❌ Model push failed")
163
+ logger.error(result.stderr)
164
+ return False
165
+
166
+ except subprocess.TimeoutExpired:
167
+ logger.error("❌ Model push timed out")
168
+ return False
169
+ except Exception as e:
170
+ logger.error(f"❌ Failed to push model: {e}")
171
+ return False
172
+
173
+ def process_model(self, repo_name: str, quantize: bool = True, push: bool = True) -> bool:
174
+ """Complete model processing workflow"""
175
+ logger.info("πŸš€ Starting model processing...")
176
+
177
+ # Step 1: Validate model
178
+ if not self.validate_model():
179
+ logger.error("❌ Model validation failed")
180
+ return False
181
+
182
+ # Step 2: Get model info
183
+ model_info = self.get_model_info()
184
+
185
+ # Step 3: Quantize if requested
186
+ if quantize:
187
+ if not self.run_quantization(repo_name):
188
+ logger.error("❌ Quantization failed")
189
+ return False
190
+
191
+ # Step 4: Push if requested
192
+ if push:
193
+ if not self.run_model_push(repo_name):
194
+ logger.error("❌ Model push failed")
195
+ return False
196
+
197
+ logger.info("πŸŽ‰ Model processing completed successfully!")
198
+ logger.info(f"🌐 View your model at: https://huggingface.co/{repo_name}")
199
+
200
+ return True
201
+
202
+ def main():
203
+ """Main function"""
204
+ import argparse
205
+
206
+ parser = argparse.ArgumentParser(description="Process recovered model")
207
+ parser.add_argument("repo_name", help="Hugging Face repository name (username/model-name)")
208
+ parser.add_argument("--model-path", default="recovered_model", help="Path to recovered model")
209
+ parser.add_argument("--no-quantize", action="store_true", help="Skip quantization")
210
+ parser.add_argument("--no-push", action="store_true", help="Skip pushing to HF Hub")
211
+ parser.add_argument("--quant-type", default="int8_weight_only",
212
+ choices=["int8_weight_only", "int4_weight_only", "int8_dynamic"],
213
+ help="Quantization type")
214
+
215
+ args = parser.parse_args()
216
+
217
+ # Initialize processor
218
+ processor = ModelProcessor(args.model_path)
219
+
220
+ # Process model
221
+ success = processor.process_model(
222
+ repo_name=args.repo_name,
223
+ quantize=not args.no_quantize,
224
+ push=not args.no_push
225
+ )
226
+
227
+ return 0 if success else 1
228
+
229
+ if __name__ == "__main__":
230
+ exit(main())
recover_model.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Model Recovery and Deployment Script
4
+ Recovers trained model from cloud instance, quantizes it, and pushes to Hugging Face Hub
5
+ """
6
+
7
+ import os
8
+ import sys
9
+ import json
10
+ import argparse
11
+ import logging
12
+ import subprocess
13
+ from pathlib import Path
14
+ from typing import Dict, Any, Optional
15
+ from datetime import datetime
16
+
17
+ # Setup logging
18
+ logging.basicConfig(
19
+ level=logging.INFO,
20
+ format='%(asctime)s - %(levelname)s - %(message)s'
21
+ )
22
+ logger = logging.getLogger(__name__)
23
+ sys.path.append(os.path.join(os.path.dirname(__file__), 'src'))
24
+
25
+
26
+ class ModelRecoveryPipeline:
27
+ """Complete model recovery and deployment pipeline"""
28
+
29
+ def __init__(
30
+ self,
31
+ model_path: str,
32
+ repo_name: str,
33
+ hf_token: Optional[str] = None,
34
+ private: bool = False,
35
+ quantize: bool = True,
36
+ quant_types: Optional[list] = None,
37
+ trackio_url: Optional[str] = None,
38
+ experiment_name: Optional[str] = None,
39
+ dataset_repo: Optional[str] = None,
40
+ author_name: Optional[str] = None,
41
+ model_description: Optional[str] = None
42
+ ):
43
+ self.model_path = Path(model_path)
44
+ self.repo_name = repo_name
45
+ self.hf_token = hf_token or os.getenv('HF_TOKEN')
46
+ self.private = private
47
+ self.quantize = quantize
48
+ self.quant_types = quant_types or ["int8_weight_only", "int4_weight_only"]
49
+ self.trackio_url = trackio_url
50
+ self.experiment_name = experiment_name
51
+ self.dataset_repo = dataset_repo
52
+ self.author_name = author_name
53
+ self.model_description = model_description
54
+
55
+ # Validate HF token
56
+ if not self.hf_token:
57
+ raise ValueError("HF_TOKEN environment variable or --hf-token argument is required")
58
+
59
+ logger.info(f"Initialized ModelRecoveryPipeline for {repo_name}")
60
+ logger.info(f"Model path: {self.model_path}")
61
+ logger.info(f"Quantization enabled: {self.quantize}")
62
+ if self.quantize:
63
+ logger.info(f"Quantization types: {self.quant_types}")
64
+
65
+ def validate_model_path(self) -> bool:
66
+ """Validate that the model path contains required files"""
67
+ if not self.model_path.exists():
68
+ logger.error(f"❌ Model path does not exist: {self.model_path}")
69
+ return False
70
+
71
+ # Check for essential model files
72
+ required_files = ['config.json']
73
+
74
+ # Check for model files (either safetensors or pytorch)
75
+ model_files = [
76
+ "model.safetensors.index.json", # Safetensors format
77
+ "pytorch_model.bin" # PyTorch format
78
+ ]
79
+
80
+ missing_files = []
81
+ for file in required_files:
82
+ if not (self.model_path / file).exists():
83
+ missing_files.append(file)
84
+
85
+ # Check if at least one model file exists
86
+ model_file_exists = any((self.model_path / file).exists() for file in model_files)
87
+ if not model_file_exists:
88
+ missing_files.extend(model_files)
89
+
90
+ if missing_files:
91
+ logger.error(f"❌ Missing required model files: {missing_files}")
92
+ return False
93
+
94
+ logger.info("βœ… Model files validated")
95
+ return True
96
+
97
+ def load_training_config(self) -> Dict[str, Any]:
98
+ """Load training configuration from model directory"""
99
+ config_files = [
100
+ "training_config.json",
101
+ "config_petite_llm_3_fr_1_20250727_152504.json",
102
+ "config_petite_llm_3_fr_1_20250727_152524.json"
103
+ ]
104
+
105
+ for config_file in config_files:
106
+ config_path = self.model_path / config_file
107
+ if config_path.exists():
108
+ with open(config_path, 'r') as f:
109
+ config = json.load(f)
110
+ logger.info(f"βœ… Loaded training config from: {config_file}")
111
+ return config
112
+
113
+ # Fallback to basic config
114
+ logger.warning("⚠️ No training config found, using default")
115
+ return {
116
+ "model_name": "HuggingFaceTB/SmolLM3-3B",
117
+ "dataset_name": "OpenHermes-FR",
118
+ "training_config_type": "Custom Configuration",
119
+ "trainer_type": "SFTTrainer",
120
+ "per_device_train_batch_size": 8,
121
+ "gradient_accumulation_steps": 16,
122
+ "learning_rate": "5e-6",
123
+ "num_train_epochs": 3,
124
+ "max_seq_length": 2048,
125
+ "dataset_size": "~80K samples",
126
+ "dataset_format": "Chat format"
127
+ }
128
+
129
+ def load_training_results(self) -> Dict[str, Any]:
130
+ """Load training results from model directory"""
131
+ results_files = [
132
+ "train_results.json",
133
+ "training_summary_petite_llm_3_fr_1_20250727_152504.json",
134
+ "training_summary_petite_llm_3_fr_1_20250727_152524.json"
135
+ ]
136
+
137
+ for results_file in results_files:
138
+ results_path = self.model_path / results_file
139
+ if results_path.exists():
140
+ with open(results_path, 'r') as f:
141
+ results = json.load(f)
142
+ logger.info(f"βœ… Loaded training results from: {results_file}")
143
+ return results
144
+
145
+ # Fallback to basic results
146
+ logger.warning("⚠️ No training results found, using default")
147
+ return {
148
+ "final_loss": "Unknown",
149
+ "total_steps": "Unknown",
150
+ "train_loss": "Unknown",
151
+ "eval_loss": "Unknown"
152
+ }
153
+
154
+ def push_main_model(self) -> bool:
155
+ """Push the main model to Hugging Face Hub"""
156
+ try:
157
+ logger.info("πŸš€ Pushing main model to Hugging Face Hub...")
158
+
159
+ # Import push script
160
+ from scripts.model_tonic.push_to_huggingface import HuggingFacePusher
161
+
162
+ # Load training data
163
+ training_config = self.load_training_config()
164
+ training_results = self.load_training_results()
165
+
166
+ # Initialize pusher
167
+ pusher = HuggingFacePusher(
168
+ model_path=str(self.model_path),
169
+ repo_name=self.repo_name,
170
+ token=self.hf_token,
171
+ private=self.private,
172
+ trackio_url=self.trackio_url,
173
+ experiment_name=self.experiment_name,
174
+ dataset_repo=self.dataset_repo,
175
+ hf_token=self.hf_token,
176
+ author_name=self.author_name,
177
+ model_description=self.model_description
178
+ )
179
+
180
+ # Push model
181
+ success = pusher.push_model(training_config, training_results)
182
+
183
+ if success:
184
+ logger.info(f"βœ… Main model pushed successfully to: https://huggingface.co/{self.repo_name}")
185
+ return True
186
+ else:
187
+ logger.error("❌ Failed to push main model")
188
+ return False
189
+
190
+ except Exception as e:
191
+ logger.error(f"❌ Error pushing main model: {e}")
192
+ return False
193
+
194
+ def quantize_and_push_models(self) -> bool:
195
+ """Quantize and push models to Hugging Face Hub"""
196
+ if not self.quantize:
197
+ logger.info("⏭️ Skipping quantization (disabled)")
198
+ return True
199
+
200
+ try:
201
+ logger.info("πŸ”„ Starting quantization and push process...")
202
+
203
+ # Import quantization script
204
+ from scripts.model_tonic.quantize_model import ModelQuantizer
205
+
206
+ success_count = 0
207
+ total_count = len(self.quant_types)
208
+
209
+ for quant_type in self.quant_types:
210
+ logger.info(f"πŸ”„ Processing quantization type: {quant_type}")
211
+
212
+ # Initialize quantizer
213
+ quantizer = ModelQuantizer(
214
+ model_path=str(self.model_path),
215
+ repo_name=self.repo_name,
216
+ token=self.hf_token,
217
+ private=self.private,
218
+ trackio_url=self.trackio_url,
219
+ experiment_name=self.experiment_name,
220
+ dataset_repo=self.dataset_repo,
221
+ hf_token=self.hf_token
222
+ )
223
+
224
+ # Perform quantization and push
225
+ success = quantizer.quantize_and_push(
226
+ quant_type=quant_type,
227
+ device="auto",
228
+ group_size=128
229
+ )
230
+
231
+ if success:
232
+ logger.info(f"βœ… {quant_type} quantization and push completed")
233
+ success_count += 1
234
+ else:
235
+ logger.error(f"❌ {quant_type} quantization and push failed")
236
+
237
+ logger.info(f"πŸ“Š Quantization summary: {success_count}/{total_count} successful")
238
+ return success_count > 0
239
+
240
+ except Exception as e:
241
+ logger.error(f"❌ Error during quantization: {e}")
242
+ return False
243
+
244
+ def run_complete_pipeline(self) -> bool:
245
+ """Run the complete model recovery and deployment pipeline"""
246
+ logger.info("πŸš€ Starting complete model recovery and deployment pipeline")
247
+
248
+ # Step 1: Validate model path
249
+ if not self.validate_model_path():
250
+ logger.error("❌ Model validation failed")
251
+ return False
252
+
253
+ # Step 2: Push main model
254
+ if not self.push_main_model():
255
+ logger.error("❌ Main model push failed")
256
+ return False
257
+
258
+ # Step 3: Quantize and push models
259
+ if not self.quantize_and_push_models():
260
+ logger.warning("⚠️ Quantization failed, but main model was pushed successfully")
261
+
262
+ logger.info("πŸŽ‰ Model recovery and deployment pipeline completed!")
263
+ logger.info(f"🌐 View your model at: https://huggingface.co/{self.repo_name}")
264
+
265
+ return True
266
+
267
+ def parse_args():
268
+ """Parse command line arguments"""
269
+ parser = argparse.ArgumentParser(description='Recover and deploy trained model to Hugging Face Hub')
270
+
271
+ # Required arguments
272
+ parser.add_argument('model_path', type=str, help='Path to trained model directory')
273
+ parser.add_argument('repo_name', type=str, help='Hugging Face repository name (username/repo-name)')
274
+
275
+ # Optional arguments
276
+ parser.add_argument('--hf-token', type=str, default=None, help='Hugging Face token')
277
+ parser.add_argument('--private', action='store_true', help='Make repository private')
278
+ parser.add_argument('--no-quantize', action='store_true', help='Skip quantization')
279
+ parser.add_argument('--quant-types', nargs='+',
280
+ choices=['int8_weight_only', 'int4_weight_only', 'int8_dynamic'],
281
+ default=['int8_weight_only', 'int4_weight_only'],
282
+ help='Quantization types to apply')
283
+ parser.add_argument('--trackio-url', type=str, default=None, help='Trackio Space URL for logging')
284
+ parser.add_argument('--experiment-name', type=str, default=None, help='Experiment name for Trackio')
285
+ parser.add_argument('--dataset-repo', type=str, default=None, help='HF Dataset repository for experiment storage')
286
+ parser.add_argument('--author-name', type=str, default=None, help='Author name for model card')
287
+ parser.add_argument('--model-description', type=str, default=None, help='Model description for model card')
288
+
289
+ return parser.parse_args()
290
+
291
+ def main():
292
+ """Main function"""
293
+ args = parse_args()
294
+
295
+ # Setup logging
296
+ logging.basicConfig(
297
+ level=logging.INFO,
298
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
299
+ )
300
+
301
+ logger.info("Starting model recovery and deployment pipeline")
302
+
303
+ # Initialize pipeline
304
+ try:
305
+ pipeline = ModelRecoveryPipeline(
306
+ model_path=args.model_path,
307
+ repo_name=args.repo_name,
308
+ hf_token=args.hf_token,
309
+ private=args.private,
310
+ quantize=not args.no_quantize,
311
+ quant_types=args.quant_types,
312
+ trackio_url=args.trackio_url,
313
+ experiment_name=args.experiment_name,
314
+ dataset_repo=args.dataset_repo,
315
+ author_name=args.author_name,
316
+ model_description=args.model_description
317
+ )
318
+
319
+ # Run complete pipeline
320
+ success = pipeline.run_complete_pipeline()
321
+
322
+ if success:
323
+ logger.info("βœ… Model recovery and deployment completed successfully!")
324
+ return 0
325
+ else:
326
+ logger.error("❌ Model recovery and deployment failed!")
327
+ return 1
328
+
329
+ except Exception as e:
330
+ logger.error(f"❌ Error during model recovery: {e}")
331
+ return 1
332
+
333
+ if __name__ == "__main__":
334
+ exit(main())
scripts/model_tonic/push_to_huggingface.py CHANGED
@@ -8,11 +8,17 @@ import os
8
  import json
9
  import argparse
10
  import logging
 
11
  from pathlib import Path
12
  from typing import Dict, Any, Optional, List
13
  from datetime import datetime
14
  import subprocess
15
  import shutil
 
 
 
 
 
16
 
17
  try:
18
  from huggingface_hub import HfApi, create_repo, upload_file
@@ -34,6 +40,14 @@ except ImportError:
34
 
35
  logger = logging.getLogger(__name__)
36
 
 
 
 
 
 
 
 
 
37
  class HuggingFacePusher:
38
  """Push trained models and results to Hugging Face Hub with HF Datasets integration"""
39
 
@@ -88,16 +102,22 @@ class HuggingFacePusher:
88
  try:
89
  logger.info(f"Creating repository: {self.repo_name}")
90
 
91
- # Create repository
92
- create_repo(
93
- repo_id=self.repo_name,
94
- token=self.token,
95
- private=self.private,
96
- exist_ok=True
97
- )
98
-
99
- logger.info(f"βœ… Repository created: https://huggingface.co/{self.repo_name}")
100
- return True
 
 
 
 
 
 
101
 
102
  except Exception as e:
103
  logger.error(f"❌ Failed to create repository: {e}")
@@ -105,18 +125,29 @@ class HuggingFacePusher:
105
 
106
  def validate_model_path(self) -> bool:
107
  """Validate that the model path contains required files"""
 
108
  required_files = [
109
  "config.json",
110
- "pytorch_model.bin",
111
  "tokenizer.json",
112
  "tokenizer_config.json"
113
  ]
114
 
 
 
 
 
 
 
115
  missing_files = []
116
  for file in required_files:
117
  if not (self.model_path / file).exists():
118
  missing_files.append(file)
119
 
 
 
 
 
 
120
  if missing_files:
121
  logger.error(f"❌ Missing required files: {missing_files}")
122
  return False
@@ -246,7 +277,6 @@ This model is fine-tuned for specific tasks and may not generalize well to all u
246
 
247
  This model is licensed under the Apache 2.0 License.
248
  """
249
- # return model_card
250
 
251
  def _get_model_size(self) -> float:
252
  """Get model size in GB"""
@@ -272,7 +302,7 @@ This model is licensed under the Apache 2.0 License.
272
  return "Unknown"
273
 
274
  def upload_model_files(self) -> bool:
275
- """Upload model files to Hugging Face Hub"""
276
  try:
277
  logger.info("Uploading model files...")
278
 
@@ -283,12 +313,19 @@ This model is licensed under the Apache 2.0 License.
283
  remote_path = str(relative_path)
284
 
285
  logger.info(f"Uploading {relative_path}")
286
- upload_file(
287
- path_or_fileobj=str(file_path),
288
- path_in_repo=remote_path,
289
- repo_id=self.repo_name,
290
- token=self.token
291
- )
 
 
 
 
 
 
 
292
 
293
  logger.info("βœ… Model files uploaded successfully")
294
  return True
@@ -378,7 +415,7 @@ Training metrics and configuration are stored in the HF Dataset repository: `{se
378
 
379
  ## Files
380
 
381
- - `pytorch_model.bin`: Model weights
382
  - `config.json`: Model configuration
383
  - `tokenizer.json`: Tokenizer configuration
384
  - `training_results/`: Training logs and results
 
8
  import json
9
  import argparse
10
  import logging
11
+ import time
12
  from pathlib import Path
13
  from typing import Dict, Any, Optional, List
14
  from datetime import datetime
15
  import subprocess
16
  import shutil
17
+ import platform
18
+
19
+ # Set timeout for HF operations to prevent hanging
20
+ os.environ['HF_HUB_DOWNLOAD_TIMEOUT'] = '300'
21
+ os.environ['HF_HUB_UPLOAD_TIMEOUT'] = '600'
22
 
23
  try:
24
  from huggingface_hub import HfApi, create_repo, upload_file
 
40
 
41
  logger = logging.getLogger(__name__)
42
 
43
+ class TimeoutError(Exception):
44
+ """Custom timeout exception"""
45
+ pass
46
+
47
+ def timeout_handler(signum, frame):
48
+ """Signal handler for timeout"""
49
+ raise TimeoutError("Operation timed out")
50
+
51
  class HuggingFacePusher:
52
  """Push trained models and results to Hugging Face Hub with HF Datasets integration"""
53
 
 
102
  try:
103
  logger.info(f"Creating repository: {self.repo_name}")
104
 
105
+ # Create repository with timeout handling
106
+ try:
107
+ # Create repository
108
+ create_repo(
109
+ repo_id=self.repo_name,
110
+ token=self.token,
111
+ private=self.private,
112
+ exist_ok=True
113
+ )
114
+
115
+ logger.info(f"βœ… Repository created: https://huggingface.co/{self.repo_name}")
116
+ return True
117
+
118
+ except Exception as e:
119
+ logger.error(f"❌ Repository creation failed: {e}")
120
+ return False
121
 
122
  except Exception as e:
123
  logger.error(f"❌ Failed to create repository: {e}")
 
125
 
126
  def validate_model_path(self) -> bool:
127
  """Validate that the model path contains required files"""
128
+ # Support both safetensors and pytorch formats
129
  required_files = [
130
  "config.json",
 
131
  "tokenizer.json",
132
  "tokenizer_config.json"
133
  ]
134
 
135
+ # Check for model files (either safetensors or pytorch)
136
+ model_files = [
137
+ "model.safetensors.index.json", # Safetensors format
138
+ "pytorch_model.bin" # PyTorch format
139
+ ]
140
+
141
  missing_files = []
142
  for file in required_files:
143
  if not (self.model_path / file).exists():
144
  missing_files.append(file)
145
 
146
+ # Check if at least one model file exists
147
+ model_file_exists = any((self.model_path / file).exists() for file in model_files)
148
+ if not model_file_exists:
149
+ missing_files.extend(model_files)
150
+
151
  if missing_files:
152
  logger.error(f"❌ Missing required files: {missing_files}")
153
  return False
 
277
 
278
  This model is licensed under the Apache 2.0 License.
279
  """
 
280
 
281
  def _get_model_size(self) -> float:
282
  """Get model size in GB"""
 
302
  return "Unknown"
303
 
304
  def upload_model_files(self) -> bool:
305
+ """Upload model files to Hugging Face Hub with timeout protection"""
306
  try:
307
  logger.info("Uploading model files...")
308
 
 
313
  remote_path = str(relative_path)
314
 
315
  logger.info(f"Uploading {relative_path}")
316
+
317
+ try:
318
+ upload_file(
319
+ path_or_fileobj=str(file_path),
320
+ path_in_repo=remote_path,
321
+ repo_id=self.repo_name,
322
+ token=self.token
323
+ )
324
+ logger.info(f"βœ… Uploaded {relative_path}")
325
+
326
+ except Exception as e:
327
+ logger.error(f"❌ Failed to upload {relative_path}: {e}")
328
+ return False
329
 
330
  logger.info("βœ… Model files uploaded successfully")
331
  return True
 
415
 
416
  ## Files
417
 
418
+ - `model.safetensors.index.json`: Model weights (safetensors format)
419
  - `config.json`: Model configuration
420
  - `tokenizer.json`: Tokenizer configuration
421
  - `training_results/`: Training logs and results
scripts/model_tonic/quantize_model.py CHANGED
@@ -13,6 +13,7 @@ from typing import Dict, Any, Optional, List, Union
13
  from datetime import datetime
14
  import subprocess
15
  import shutil
 
16
 
17
  try:
18
  import torch
@@ -100,14 +101,25 @@ class ModelQuantizer:
100
  return False
101
 
102
  # Check for essential model files
103
- required_files = ['config.json', 'pytorch_model.bin']
104
  optional_files = ['tokenizer.json', 'tokenizer_config.json']
105
 
 
 
 
 
 
 
106
  missing_files = []
107
  for file in required_files:
108
  if not (self.model_path / file).exists():
109
  missing_files.append(file)
110
 
 
 
 
 
 
111
  if missing_files:
112
  logger.error(f"❌ Missing required model files: {missing_files}")
113
  return False
 
13
  from datetime import datetime
14
  import subprocess
15
  import shutil
16
+ import platform
17
 
18
  try:
19
  import torch
 
101
  return False
102
 
103
  # Check for essential model files
104
+ required_files = ['config.json']
105
  optional_files = ['tokenizer.json', 'tokenizer_config.json']
106
 
107
+ # Check for model files (either safetensors or pytorch)
108
+ model_files = [
109
+ "model.safetensors.index.json", # Safetensors format
110
+ "pytorch_model.bin" # PyTorch format
111
+ ]
112
+
113
  missing_files = []
114
  for file in required_files:
115
  if not (self.model_path / file).exists():
116
  missing_files.append(file)
117
 
118
+ # Check if at least one model file exists
119
+ model_file_exists = any((self.model_path / file).exists() for file in model_files)
120
+ if not model_file_exists:
121
+ missing_files.extend(model_files)
122
+
123
  if missing_files:
124
  logger.error(f"❌ Missing required model files: {missing_files}")
125
  return False
config_test_monitoring_auto_resolve_20250727_153310.json β†’ test_data/config_test_monitoring_auto_resolve_20250727_153310.json RENAMED
File without changes
config_test_monitoring_auto_resolve_20250727_161709.json β†’ test_data/config_test_monitoring_auto_resolve_20250727_161709.json RENAMED
File without changes
config_test_monitoring_integration_20250727_151307.json β†’ test_data/config_test_monitoring_integration_20250727_151307.json RENAMED
File without changes
config_test_monitoring_integration_20250727_151403.json β†’ test_data/config_test_monitoring_integration_20250727_151403.json RENAMED
File without changes
test_update_kwargs.py β†’ tests/test_update_kwargs_1.py RENAMED
File without changes
verify_fix.py β†’ tests/verify_fix_1.py RENAMED
File without changes