Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

Tonic commited on Jul 18

Commit

d8dd7a1

verified ·

0 Parent(s):

first commit

Browse files

Files changed (13) hide show

.gitignore +98 -0
README.md +291 -0
config.py +28 -0
config/train_smollm3.py +107 -0
config/train_smollm3_dpo.py +38 -0
config/train_smollm3_long_context.py +38 -0
create_sample_dataset.py +41 -0
data.py +238 -0
model.py +188 -0
requirements.txt +35 -0
test_setup.py +206 -0
train.py +144 -0
trainer.py +242 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,98 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyTorch
+*.pth
+*.pt
+*.ckpt
+# Jupyter Notebook
+.ipynb_checkpoints
+# Environment
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+# Logs
+*.log
+logs/
+tensorboard_logs/
+# Model outputs
+output/
+checkpoints/
+models/
+wandb/
+# Datasets
+data/
+datasets/
+my_dataset/
+test_dataset/
+# Temporary files
+tmp/
+temp/
+*.tmp
+*.temp
+# Hugging Face cache
+.cache/
+transformers_cache/
+# Accelerate
+accelerate_config.yaml
+# Training outputs
+runs/
+*.json
+!config/*.json
+!*.json.example
+# Evaluation results
+eval_results/
+test_results/
+# Documentation
+docs/_build/

README.md ADDED Viewed

	@@ -0,0 +1,291 @@

+# SmolLM3 Fine-tuning for FlexAI Console
+This repository provides a complete setup for fine-tuning SmolLM3 models using the FlexAI console, following the nanoGPT structure but adapted for modern transformer models.
+## Overview
+SmolLM3 is a 3B-parameter transformer decoder model optimized for efficiency, long-context reasoning, and multilingual support. This setup allows you to fine-tune SmolLM3 for various tasks including:
+- **Supervised Fine-tuning (SFT)**: Adapt the model for instruction following
+- **Direct Preference Optimization (DPO)**: Improve model alignment
+- **Long-context fine-tuning**: Support for up to 128k tokens
+- **Tool calling**: Fine-tune for function calling capabilities
+## Quick Start
+### 1. Repository Setup
+The repository follows the FlexAI console structure with the following key files:
+- `train.py`: Main entry point script
+- `config/train_smollm3.py`: Default configuration
+- `model.py`: Model wrapper and loading
+- `data.py`: Dataset handling and preprocessing
+- `trainer.py`: Training loop and trainer setup
+- `requirements.txt`: Dependencies
+### 2. FlexAI Console Configuration
+When setting up a Fine Tuning Job in the FlexAI console, use these settings:
+#### Basic Configuration
+- **Name**: `smollm3-finetune`
+- **Cluster**: Your organization's designated cluster
+- **Checkpoint**: (Optional) Previous training job checkpoint
+- **Node Count**: 1
+- **Accelerator Count**: 1-8 (depending on your needs)
+#### Repository Settings
+- **Repository URL**: `https://github.com/your-username/flexai-finetune`
+- **Repository Revision**: `main`
+#### Dataset Configuration
+- **Datasets**: Your dataset (mounted under `/input`)
+- **Mount Directory**: `my_dataset`
+#### Entry Point
+```
+train.py config/train_smollm3.py --dataset_dir=my_dataset --init_from=resume --out_dir=/input-checkpoint --max_iters=1500
+```
+### 3. Dataset Format
+The script supports multiple dataset formats:
+#### Chat Format (Recommended)
+```json
+[
+  {
+    "messages": [
+      {"role": "user", "content": "What is machine learning?"},
+      {"role": "assistant", "content": "Machine learning is a subset of AI..."}
+    ]
+  }
+]
+```
+#### Instruction Format
+```json
+[
+  {
+    "instruction": "What is machine learning?",
+    "output": "Machine learning is a subset of AI..."
+  }
+]
+```
+#### User-Assistant Format
+```json
+[
+  {
+    "user": "What is machine learning?",
+    "assistant": "Machine learning is a subset of AI..."
+  }
+]
+```
+### 4. Configuration Options
+The default configuration in `config/train_smollm3.py` includes:
+```python
+@dataclass
+class SmolLM3Config:
+    # Model configuration
+    model_name: str = "HuggingFaceTB/SmolLM3-3B"
+    max_seq_length: int = 4096
+    use_flash_attention: bool = True
+    # Training configuration
+    batch_size: int = 4
+    gradient_accumulation_steps: int = 4
+    learning_rate: float = 2e-5
+    max_iters: int = 1000
+    # Mixed precision
+    fp16: bool = True
+    bf16: bool = False
+```
+### 5. Command Line Arguments
+The `train.py` script accepts various arguments:
+```bash
+# Basic usage
+python train.py config/train_smollm3.py
+# With custom parameters
+python train.py config/train_smollm3.py \
+    --dataset_dir=my_dataset \
+    --out_dir=/output-checkpoint \
+    --init_from=resume \
+    --max_iters=1500 \
+    --batch_size=8 \
+    --learning_rate=1e-5 \
+    --max_seq_length=8192
+```
+## Advanced Usage
+### 1. Custom Configuration
+Create a custom configuration file:
+```python
+# config/my_config.py
+from config.train_smollm3 import SmolLM3Config
+config = SmolLM3Config(
+    model_name="HuggingFaceTB/SmolLM3-3B-Instruct",
+    max_seq_length=8192,
+    batch_size=2,
+    learning_rate=1e-5,
+    max_iters=2000,
+    use_flash_attention=True,
+    fp16=True
+)
+```
+### 2. Long-Context Fine-tuning
+For long-context tasks (up to 128k tokens):
+```python
+config = SmolLM3Config(
+    max_seq_length=131072,  # 128k tokens
+    model_name="HuggingFaceTB/SmolLM3-3B",
+    use_flash_attention=True,
+    gradient_checkpointing=True
+)
+```
+### 3. DPO Training
+For preference optimization, use the DPO trainer:
+```python
+from trainer import SmolLM3DPOTrainer
+dpo_trainer = SmolLM3DPOTrainer(
+    model=model,
+    dataset=dataset,
+    config=config,
+    output_dir="./dpo-output"
+)
+dpo_trainer.train()
+```
+### 4. Tool Calling Fine-tuning
+Include tool calling examples in your dataset:
+```json
+[
+  {
+    "messages": [
+      {"role": "user", "content": "What's the weather in New York?"},
+      {"role": "assistant", "content": "<tool_call>\n<invoke name=\"get_weather\">\n<parameter name=\"location\">New York</parameter>\n</invoke>\n</tool_call>"},
+      {"role": "tool", "content": "The weather in New York is 72°F and sunny."},
+      {"role": "assistant", "content": "The weather in New York is currently 72°F and sunny."}
+    ]
+  }
+]
+```
+## Model Variants
+SmolLM3 comes in several variants:
+- **SmolLM3-3B-Base**: Base model for general fine-tuning
+- **SmolLM3-3B**: Instruction-tuned model
+- **SmolLM3-3B-Instruct**: Enhanced instruction model
+- **Quantized versions**: Available for deployment
+## Hardware Requirements
+### Minimum Requirements
+- **GPU**: 16GB+ VRAM (for 3B model)
+- **RAM**: 32GB+ system memory
+- **Storage**: 50GB+ free space
+### Recommended
+- **GPU**: A100/H100 or similar
+- **RAM**: 64GB+ system memory
+- **Storage**: 100GB+ SSD
+## Troubleshooting
+### Common Issues
+1. **Out of Memory (OOM)**
+   - Reduce `batch_size`
+   - Increase `gradient_accumulation_steps`
+   - Enable `gradient_checkpointing`
+   - Use `fp16` or `bf16`
+2. **Slow Training**
+   - Enable `flash_attention`
+   - Use mixed precision (`fp16`/`bf16`)
+   - Increase `dataloader_num_workers`
+3. **Dataset Loading Issues**
+   - Check dataset format
+   - Ensure proper JSON structure
+   - Verify file permissions
+### Debug Mode
+Enable debug logging:
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+## Evaluation
+After training, evaluate your model:
+```python
+from transformers import pipeline
+pipe = pipeline(
+    task="text-generation",
+    model="./output-checkpoint",
+    device=0,
+    max_new_tokens=256,
+    do_sample=True,
+    temperature=0.7
+)
+# Test the model
+messages = [{"role": "user", "content": "Explain gravity in simple terms."}]
+outputs = pipe(messages)
+print(outputs[0]["generated_text"][-1]["content"])
+```
+## Deployment
+### Using vLLM
+```bash
+vllm serve ./output-checkpoint --enable-auto-tool-choice
+```
+### Using llama.cpp
+```bash
+# Convert to GGUF format
+python -m llama_cpp.convert_model ./output-checkpoint --outfile model.gguf
+```
+## Resources
+- [SmolLM3 Blog Post](https://huggingface.co/blog/smollm3)
+- [Model Repository](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
+- [GitHub Repository](https://github.com/huggingface/smollm)
+- [SmolTalk Dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
+## License
+This project follows the same license as the SmolLM3 model. Please refer to the Hugging Face model page for licensing information.

config.py ADDED Viewed

	@@ -0,0 +1,28 @@

+"""
+Configuration management for SmolLM3 fine-tuning
+"""
+import os
+import importlib.util
+from typing import Any
+from config.train_smollm3 import SmolLM3Config, get_config as get_default_config
+def get_config(config_path: str) -> SmolLM3Config:
+    """Load configuration from file or return default"""
+    if os.path.exists(config_path):
+        # Load from file if it exists
+        spec = importlib.util.spec_from_file_location("config_module", config_path)
+        config_module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(config_module)
+        if hasattr(config_module, 'config'):
+            return config_module.config
+        else:
+            # Try to find a config class
+            for attr_name in dir(config_module):
+                attr = getattr(config_module, attr_name)
+                if isinstance(attr, SmolLM3Config):
+                    return attr
+    # Return default configuration
+    return get_default_config(config_path)

config/train_smollm3.py ADDED Viewed

	@@ -0,0 +1,107 @@

+"""
+SmolLM3 Training Configuration
+Based on nanoGPT structure but adapted for SmolLM3
+"""
+import os
+from dataclasses import dataclass
+from typing import Optional
+@dataclass
+class SmolLM3Config:
+    """Configuration for SmolLM3 fine-tuning"""
+    # Model configuration
+    model_name: str = "HuggingFaceTB/SmolLM3-3B"
+    max_seq_length: int = 4096
+    use_flash_attention: bool = True
+    use_gradient_checkpointing: bool = True
+    # Training configuration
+    batch_size: int = 4
+    gradient_accumulation_steps: int = 4
+    learning_rate: float = 2e-5
+    weight_decay: float = 0.01
+    warmup_steps: int = 100
+    max_iters: int = 1000
+    eval_interval: int = 100
+    log_interval: int = 10
+    save_interval: int = 500
+    # Optimizer configuration
+    optimizer: str = "adamw"
+    beta1: float = 0.9
+    beta2: float = 0.95
+    eps: float = 1e-8
+    # Scheduler configuration
+    scheduler: str = "cosine"
+    min_lr: float = 1e-6
+    # Mixed precision
+    fp16: bool = True
+    bf16: bool = False
+    # DDP configuration
+    ddp_backend: str = "nccl"
+    ddp_find_unused_parameters: bool = False
+    # Logging and saving
+    save_steps: int = 500
+    eval_steps: int = 100
+    logging_steps: int = 10
+    save_total_limit: Optional[int] = 3
+    # Evaluation
+    eval_strategy: str = "steps"
+    metric_for_best_model: str = "eval_loss"
+    greater_is_better: bool = False
+    load_best_model_at_end: bool = True
+    # Data configuration
+    data_dir: str = "my_dataset"
+    train_file: str = "train.json"
+    validation_file: Optional[str] = None
+    test_file: Optional[str] = None
+    # Chat template configuration
+    use_chat_template: bool = True
+    chat_template_kwargs: dict = None
+    def __post_init__(self):
+        if self.chat_template_kwargs is None:
+            self.chat_template_kwargs = {
+                "enable_thinking": False,
+                "add_generation_prompt": True
+            }
+        # Validate configuration
+        if self.fp16 and self.bf16:
+            raise ValueError("Cannot use both fp16 and bf16")
+        if self.max_seq_length > 131072:  # 128k limit
+            raise ValueError("max_seq_length cannot exceed 131072")
+def get_config(config_path: str) -> SmolLM3Config:
+    """Load configuration from file or return default"""
+    if os.path.exists(config_path):
+        # Load from file if it exists
+        import importlib.util
+        spec = importlib.util.spec_from_file_location("config_module", config_path)
+        config_module = importlib.util.module_from_spec(spec)
+        spec.loader.exec_module(config_module)
+        if hasattr(config_module, 'config'):
+            return config_module.config
+        else:
+            # Try to find a config class
+            for attr_name in dir(config_module):
+                attr = getattr(config_module, attr_name)
+                if isinstance(attr, SmolLM3Config):
+                    return attr
+    # Return default configuration
+    return SmolLM3Config()
+# Default configuration instance
+config = SmolLM3Config()

config/train_smollm3_dpo.py ADDED Viewed

	@@ -0,0 +1,38 @@

+"""
+SmolLM3 DPO Training Configuration
+Optimized for Direct Preference Optimization
+"""
+from config.train_smollm3 import SmolLM3Config
+config = SmolLM3Config(
+    # Model configuration
+    model_name="HuggingFaceTB/SmolLM3-3B-Instruct",  # Start from instruction-tuned model
+    max_seq_length=4096,
+    use_flash_attention=True,
+    use_gradient_checkpointing=True,
+    # Training configuration
+    batch_size=2,  # Smaller batch size for DPO
+    gradient_accumulation_steps=4,
+    learning_rate=5e-6,  # Very low learning rate for DPO
+    weight_decay=0.01,
+    warmup_steps=100,
+    max_iters=1000,
+    # Mixed precision
+    fp16=True,
+    bf16=False,
+    # Logging and saving
+    save_steps=200,
+    eval_steps=100,
+    logging_steps=20,
+    # Chat template configuration
+    use_chat_template=True,
+    chat_template_kwargs={
+        "enable_thinking": False,  # Disable reasoning for preference learning
+        "add_generation_prompt": True
+    }
+)

config/train_smollm3_long_context.py ADDED Viewed

	@@ -0,0 +1,38 @@

+"""
+SmolLM3 Long-Context Training Configuration
+Optimized for long-context tasks (up to 128k tokens)
+"""
+from config.train_smollm3 import SmolLM3Config
+config = SmolLM3Config(
+    # Model configuration
+    model_name="HuggingFaceTB/SmolLM3-3B",
+    max_seq_length=131072,  # 128k tokens
+    use_flash_attention=True,
+    use_gradient_checkpointing=True,
+    # Training configuration
+    batch_size=1,  # Reduced for long sequences
+    gradient_accumulation_steps=8,  # Increased to maintain effective batch size
+    learning_rate=1e-5,  # Lower learning rate for stability
+    weight_decay=0.01,
+    warmup_steps=200,
+    max_iters=500,
+    # Mixed precision
+    fp16=True,
+    bf16=False,
+    # Logging and saving
+    save_steps=100,
+    eval_steps=50,
+    logging_steps=10,
+    # Chat template configuration
+    use_chat_template=True,
+    chat_template_kwargs={
+        "enable_thinking": True,  # Enable reasoning mode
+        "add_generation_prompt": True
+    }
+)

create_sample_dataset.py ADDED Viewed

	@@ -0,0 +1,41 @@

+#!/usr/bin/env python3
+"""
+Sample Dataset Creation Script
+Creates sample datasets for testing SmolLM3 fine-tuning
+"""
+import os
+import json
+import argparse
+from data import create_sample_dataset
+def main():
+    parser = argparse.ArgumentParser(description='Create sample dataset for SmolLM3 fine-tuning')
+    parser.add_argument('--output_dir', type=str, default='my_dataset',
+                       help='Output directory for the dataset')
+    parser.add_argument('--format', type=str, default='chat',
+                       choices=['chat', 'instruction', 'user_assistant'],
+                       help='Dataset format')
+    parser.add_argument('--num_samples', type=int, default=100,
+                       help='Number of samples to create')
+    args = parser.parse_args()
+    # Create sample dataset
+    output_path = create_sample_dataset(args.output_dir)
+    print(f"Sample dataset created in: {output_path}")
+    print(f"Format: {args.format}")
+    print(f"Samples: {args.num_samples}")
+    print("\nFiles created:")
+    print(f"- {os.path.join(output_path, 'train.json')}")
+    print(f"- {os.path.join(output_path, 'validation.json')}")
+    # Show sample data
+    with open(os.path.join(output_path, 'train.json'), 'r') as f:
+        data = json.load(f)
+        print(f"\nSample data:")
+        print(json.dumps(data[0], indent=2))
+if __name__ == '__main__':
+    main()

data.py ADDED Viewed

	@@ -0,0 +1,238 @@

+"""
+SmolLM3 Dataset Handler
+Handles data loading, preprocessing, and tokenization for SmolLM3 fine-tuning
+"""
+import os
+import json
+import torch
+from typing import Dict, List, Optional, Union
+from datasets import Dataset, load_dataset
+from transformers import PreTrainedTokenizer
+import logging
+logger = logging.getLogger(__name__)
+class SmolLM3Dataset:
+    """Dataset handler for SmolLM3 fine-tuning"""
+    def __init__(
+        self,
+        data_path: str,
+        tokenizer: PreTrainedTokenizer,
+        max_seq_length: int = 4096,
+        use_chat_template: bool = True,
+        chat_template_kwargs: Optional[Dict] = None
+    ):
+        self.data_path = data_path
+        self.tokenizer = tokenizer
+        self.max_seq_length = max_seq_length
+        self.use_chat_template = use_chat_template
+        self.chat_template_kwargs = chat_template_kwargs or {}
+        # Load and process dataset
+        self.dataset = self._load_dataset()
+        self.processed_dataset = self._process_dataset()
+    def _load_dataset(self) -> Dataset:
+        """Load dataset from various formats"""
+        logger.info(f"Loading dataset from {self.data_path}")
+        # Check if it's a Hugging Face dataset
+        if os.path.isdir(self.data_path):
+            # Local directory
+            try:
+                dataset = load_dataset("json", data_files={
+                    "train": os.path.join(self.data_path, "train.json"),
+                    "validation": os.path.join(self.data_path, "validation.json") if os.path.exists(os.path.join(self.data_path, "validation.json")) else None,
+                    "test": os.path.join(self.data_path, "test.json") if os.path.exists(os.path.join(self.data_path, "test.json")) else None
+                })
+                logger.info("Loaded dataset from local JSON files")
+                return dataset
+            except Exception as e:
+                logger.warning(f"Failed to load as JSON dataset: {e}")
+        # Try to load as a single JSON file
+        if os.path.isfile(self.data_path) and self.data_path.endswith('.json'):
+            try:
+                with open(self.data_path, 'r', encoding='utf-8') as f:
+                    data = json.load(f)
+                # Convert to dataset format
+                if isinstance(data, list):
+                    dataset = Dataset.from_list(data)
+                else:
+                    dataset = Dataset.from_dict(data)
+                logger.info("Loaded dataset from single JSON file")
+                return dataset
+            except Exception as e:
+                logger.error(f"Failed to load JSON file: {e}")
+                raise
+        # Try to load as a Hugging Face dataset name
+        try:
+            dataset = load_dataset(self.data_path)
+            logger.info(f"Loaded Hugging Face dataset: {self.data_path}")
+            return dataset
+        except Exception as e:
+            logger.error(f"Failed to load dataset: {e}")
+            raise
+    def _process_dataset(self) -> Dataset:
+        """Process the dataset for training"""
+        logger.info("Processing dataset for training")
+        def format_chat_template(example):
+            """Format example using chat template"""
+            if self.use_chat_template:
+                try:
+                    # Handle different input formats
+                    if "messages" in example:
+                        messages = example["messages"]
+                    elif "conversations" in example:
+                        messages = example["conversations"]
+                    elif "user" in example and "assistant" in example:
+                        messages = [
+                            {"role": "user", "content": example["user"]},
+                            {"role": "assistant", "content": example["assistant"]}
+                        ]
+                    elif "instruction" in example and "output" in example:
+                        messages = [
+                            {"role": "user", "content": example["instruction"]},
+                            {"role": "assistant", "content": example["output"]}
+                        ]
+                    elif "prompt" in example and "completion" in example:
+                        messages = [
+                            {"role": "user", "content": example["prompt"]},
+                            {"role": "assistant", "content": example["completion"]}
+                        ]
+                    else:
+                        # Fallback: treat as plain text
+                        return {"text": str(example)}
+                    # Apply chat template
+                    text = self.tokenizer.apply_chat_template(
+                        messages,
+                        tokenize=False,
+                        **self.chat_template_kwargs
+                    )
+                    return {"text": text}
+                except Exception as e:
+                    logger.warning(f"Failed to apply chat template: {e}")
+                    # Fallback to plain text
+                    return {"text": str(example)}
+            else:
+                # Use plain text
+                if "text" in example:
+                    return {"text": example["text"]}
+                else:
+                    return {"text": str(example)}
+        def tokenize_function(examples):
+            """Tokenize the examples"""
+            # Tokenize the texts
+            tokenized = self.tokenizer(
+                examples["text"],
+                truncation=True,
+                padding=False,
+                max_length=self.max_seq_length,
+                return_overflowing_tokens=True,
+                return_length=True,
+            )
+            # Calculate input length
+            input_length = [len(x) for x in tokenized["input_ids"]]
+            # Create labels (same as input_ids for causal LM)
+            tokenized["labels"] = tokenized["input_ids"].copy()
+            return {
+                "input_ids": tokenized["input_ids"],
+                "attention_mask": tokenized["attention_mask"],
+                "labels": tokenized["labels"],
+                "length": input_length,
+            }
+        # Process the dataset
+        processed_dataset = self.dataset.map(
+            format_chat_template,
+            remove_columns=self.dataset["train"].column_names,
+            desc="Formatting dataset"
+        )
+        # Tokenize the dataset
+        tokenized_dataset = processed_dataset.map(
+            tokenize_function,
+            remove_columns=processed_dataset["train"].column_names,
+            desc="Tokenizing dataset",
+            batched=True,
+        )
+        logger.info(f"Dataset processed. Train samples: {len(tokenized_dataset['train'])}")
+        if "validation" in tokenized_dataset:
+            logger.info(f"Validation samples: {len(tokenized_dataset['validation'])}")
+        return tokenized_dataset
+    def get_train_dataset(self) -> Dataset:
+        """Get training dataset"""
+        return self.processed_dataset["train"]
+    def get_eval_dataset(self) -> Optional[Dataset]:
+        """Get evaluation dataset if available"""
+        if "validation" in self.processed_dataset:
+            return self.processed_dataset["validation"]
+        elif "test" in self.processed_dataset:
+            return self.processed_dataset["test"]
+        else:
+            return None
+    def get_data_collator(self):
+        """Get data collator for training"""
+        from transformers import DataCollatorForLanguageModeling
+        return DataCollatorForLanguageModeling(
+            tokenizer=self.tokenizer,
+            mlm=False,  # We're doing causal LM, not masked LM
+        )
+def create_sample_dataset(output_path: str = "my_dataset"):
+    """Create a sample dataset for testing"""
+    os.makedirs(output_path, exist_ok=True)
+    # Sample conversations
+    conversations = [
+        {
+            "messages": [
+                {"role": "user", "content": "What is machine learning?"},
+                {"role": "assistant", "content": "Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed."}
+            ]
+        },
+        {
+            "messages": [
+                {"role": "user", "content": "Explain gravity in simple terms."},
+                {"role": "assistant", "content": "Gravity is the force that pulls objects toward each other, like how the Earth pulls things down to the ground."}
+            ]
+        },
+        {
+            "messages": [
+                {"role": "user", "content": "How do I make a cup of coffee?"},
+                {"role": "assistant", "content": "To make a cup of coffee: 1) Boil water, 2) Add coffee grounds to a filter, 3) Pour hot water over the grounds, 4) Let it brew for a few minutes, 5) Enjoy!"}
+            ]
+        }
+    ]
+    # Split into train/validation
+    train_data = conversations[:2]
+    validation_data = conversations[2:]
+    # Save to files
+    with open(os.path.join(output_path, "train.json"), 'w', encoding='utf-8') as f:
+        json.dump(train_data, f, indent=2, ensure_ascii=False)
+    with open(os.path.join(output_path, "validation.json"), 'w', encoding='utf-8') as f:
+        json.dump(validation_data, f, indent=2, ensure_ascii=False)
+    logger.info(f"Sample dataset created in {output_path}")
+    return output_path

model.py ADDED Viewed

	@@ -0,0 +1,188 @@

+"""
+SmolLM3 Model Wrapper
+Handles model loading, tokenizer, and training setup
+"""
+import os
+import torch
+import torch.nn as nn
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    AutoConfig,
+    TrainingArguments,
+    Trainer
+)
+from typing import Optional, Dict, Any
+import logging
+logger = logging.getLogger(__name__)
+class SmolLM3Model:
+    """Wrapper for SmolLM3 model and tokenizer"""
+    def __init__(
+        self,
+        model_name: str = "HuggingFaceTB/SmolLM3-3B",
+        max_seq_length: int = 4096,
+        config: Optional[Any] = None,
+        device_map: Optional[str] = None,
+        torch_dtype: Optional[torch.dtype] = None
+    ):
+        self.model_name = model_name
+        self.max_seq_length = max_seq_length
+        self.config = config
+        # Set device and dtype
+        if torch_dtype is None:
+            if torch.cuda.is_available():
+                self.torch_dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
+            else:
+                self.torch_dtype = torch.float32
+        else:
+            self.torch_dtype = torch_dtype
+        if device_map is None:
+            self.device_map = "auto" if torch.cuda.is_available() else "cpu"
+        else:
+            self.device_map = device_map
+        # Load tokenizer and model
+        self._load_tokenizer()
+        self._load_model()
+    def _load_tokenizer(self):
+        """Load the tokenizer"""
+        logger.info(f"Loading tokenizer from {self.model_name}")
+        try:
+            self.tokenizer = AutoTokenizer.from_pretrained(
+                self.model_name,
+                trust_remote_code=True,
+                use_fast=True
+            )
+            # Set pad token if not present
+            if self.tokenizer.pad_token is None:
+                self.tokenizer.pad_token = self.tokenizer.eos_token
+            logger.info(f"Tokenizer loaded successfully. Vocab size: {self.tokenizer.vocab_size}")
+        except Exception as e:
+            logger.error(f"Failed to load tokenizer: {e}")
+            raise
+    def _load_model(self):
+        """Load the model"""
+        logger.info(f"Loading model from {self.model_name}")
+        try:
+            # Load model configuration
+            model_config = AutoConfig.from_pretrained(
+                self.model_name,
+                trust_remote_code=True
+            )
+            # Update configuration if needed
+            if hasattr(model_config, 'max_position_embeddings'):
+                model_config.max_position_embeddings = self.max_seq_length
+            # Load model
+            self.model = AutoModelForCausalLM.from_pretrained(
+                self.model_name,
+                config=model_config,
+                torch_dtype=self.torch_dtype,
+                device_map=self.device_map,
+                trust_remote_code=True,
+                use_flash_attention_2=self.config.use_flash_attention if self.config else True,
+                use_cache=False  # Disable KV cache for training
+            )
+            # Enable gradient checkpointing if specified
+            if self.config and self.config.use_gradient_checkpointing:
+                self.model.gradient_checkpointing_enable()
+            logger.info(f"Model loaded successfully. Parameters: {self.model.num_parameters():,}")
+        except Exception as e:
+            logger.error(f"Failed to load model: {e}")
+            raise
+    def get_training_arguments(self, output_dir: str, **kwargs) -> TrainingArguments:
+        """Get training arguments for the Trainer"""
+        if self.config is None:
+            raise ValueError("Config is required to get training arguments")
+        # Merge config with kwargs
+        training_args = {
+            "output_dir": output_dir,
+            "per_device_train_batch_size": self.config.batch_size,
+            "per_device_eval_batch_size": self.config.batch_size,
+            "gradient_accumulation_steps": self.config.gradient_accumulation_steps,
+            "learning_rate": self.config.learning_rate,
+            "weight_decay": self.config.weight_decay,
+            "warmup_steps": self.config.warmup_steps,
+            "max_steps": self.config.max_iters,
+            "save_steps": self.config.save_steps,
+            "eval_steps": self.config.eval_steps,
+            "logging_steps": self.config.logging_steps,
+            "save_total_limit": self.config.save_total_limit,
+            "evaluation_strategy": self.config.eval_strategy,
+            "metric_for_best_model": self.config.metric_for_best_model,
+            "greater_is_better": self.config.greater_is_better,
+            "load_best_model_at_end": self.config.load_best_model_at_end,
+            "fp16": self.config.fp16,
+            "bf16": self.config.bf16,
+            "ddp_backend": self.config.ddp_backend,
+            "ddp_find_unused_parameters": self.config.ddp_find_unused_parameters,
+            "report_to": "none",  # Disable external logging
+            "remove_unused_columns": False,
+            "dataloader_pin_memory": False,
+            "group_by_length": True,
+            "length_column_name": "length",
+            "ignore_data_skip": False,
+            "seed": 42,
+            "data_seed": 42,
+            "dataloader_num_workers": 4,
+            "max_grad_norm": 1.0,
+            "optim": self.config.optimizer,
+            "lr_scheduler_type": self.config.scheduler,
+            "warmup_ratio": 0.1,
+            "save_strategy": "steps",
+            "logging_strategy": "steps",
+            "prediction_loss_only": True,
+        }
+        # Override with kwargs
+        training_args.update(kwargs)
+        return TrainingArguments(**training_args)
+    def save_pretrained(self, path: str):
+        """Save model and tokenizer"""
+        logger.info(f"Saving model and tokenizer to {path}")
+        os.makedirs(path, exist_ok=True)
+        self.model.save_pretrained(path)
+        self.tokenizer.save_pretrained(path)
+        # Save configuration
+        if self.config:
+            import json
+            config_dict = {k: v for k, v in self.config.__dict__.items()
+                          if not k.startswith('_')}
+            with open(os.path.join(path, 'training_config.json'), 'w') as f:
+                json.dump(config_dict, f, indent=2, default=str)
+    def load_checkpoint(self, checkpoint_path: str):
+        """Load model from checkpoint"""
+        logger.info(f"Loading checkpoint from {checkpoint_path}")
+        try:
+            self.model = AutoModelForCausalLM.from_pretrained(
+                checkpoint_path,
+                torch_dtype=self.torch_dtype,
+                device_map=self.device_map,
+                trust_remote_code=True
+            )
+            logger.info("Checkpoint loaded successfully")
+        except Exception as e:
+            logger.error(f"Failed to load checkpoint: {e}")
+            raise

requirements.txt ADDED Viewed

	@@ -0,0 +1,35 @@

+# Core dependencies
+torch>=2.0.0
+transformers>=4.53.0
+datasets>=2.14.0
+accelerate>=0.20.0
+trl>=0.7.0
+# Hugging Face ecosystem
+huggingface-hub>=0.16.0
+tokenizers>=0.13.0
+# Training and optimization
+flash-attn>=2.0.0
+xformers>=0.0.20
+bitsandbytes>=0.41.0
+# Utilities
+numpy>=1.24.0
+pandas>=2.0.0
+scikit-learn>=1.3.0
+tqdm>=4.65.0
+wandb>=0.15.0
+# Optional: for evaluation
+lighteval>=0.1.0
+evaluate>=0.4.0
+# Optional: for deployment
+vllm>=0.2.0
+sentencepiece>=0.1.99
+# Development
+pytest>=7.0.0
+black>=23.0.0
+isort>=5.12.0

test_setup.py ADDED Viewed

	@@ -0,0 +1,206 @@

+#!/usr/bin/env python3
+"""
+Test Setup Script
+Verifies that all components are working correctly
+"""
+import os
+import sys
+import torch
+import logging
+from pathlib import Path
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def test_imports():
+    """Test that all required modules can be imported"""
+    logger.info("Testing imports...")
+    try:
+        import transformers
+        logger.info(f"✓ transformers {transformers.__version__}")
+    except ImportError as e:
+        logger.error(f"✗ transformers: {e}")
+        return False
+    try:
+        import datasets
+        logger.info(f"✓ datasets {datasets.__version__}")
+    except ImportError as e:
+        logger.error(f"✗ datasets: {e}")
+        return False
+    try:
+        import trl
+        logger.info(f"✓ trl {trl.__version__}")
+    except ImportError as e:
+        logger.error(f"✗ trl: {e}")
+        return False
+    try:
+        import accelerate
+        logger.info(f"✓ accelerate {accelerate.__version__}")
+    except ImportError as e:
+        logger.error(f"✗ accelerate: {e}")
+        return False
+    return True
+def test_local_imports():
+    """Test that local modules can be imported"""
+    logger.info("Testing local imports...")
+    try:
+        from config import get_config
+        logger.info("✓ config module")
+    except ImportError as e:
+        logger.error(f"✗ config module: {e}")
+        return False
+    try:
+        from model import SmolLM3Model
+        logger.info("✓ model module")
+    except ImportError as e:
+        logger.error(f"✗ model module: {e}")
+        return False
+    try:
+        from data import SmolLM3Dataset
+        logger.info("✓ data module")
+    except ImportError as e:
+        logger.error(f"✗ data module: {e}")
+        return False
+    try:
+        from trainer import SmolLM3Trainer
+        logger.info("✓ trainer module")
+    except ImportError as e:
+        logger.error(f"✗ trainer module: {e}")
+        return False
+    return True
+def test_config():
+    """Test configuration loading"""
+    logger.info("Testing configuration...")
+    try:
+        from config import get_config
+        config = get_config("config/train_smollm3.py")
+        logger.info(f"✓ Configuration loaded: {config.model_name}")
+        return True
+    except Exception as e:
+        logger.error(f"✗ Configuration loading failed: {e}")
+        return False
+def test_dataset_creation():
+    """Test dataset creation"""
+    logger.info("Testing dataset creation...")
+    try:
+        from data import create_sample_dataset
+        output_path = create_sample_dataset("test_dataset")
+        # Check if files were created
+        train_file = os.path.join(output_path, "train.json")
+        val_file = os.path.join(output_path, "validation.json")
+        if os.path.exists(train_file) and os.path.exists(val_file):
+            logger.info("✓ Sample dataset created successfully")
+            # Clean up
+            import shutil
+            shutil.rmtree(output_path)
+            return True
+        else:
+            logger.error("✗ Dataset files not created")
+            return False
+    except Exception as e:
+        logger.error(f"✗ Dataset creation failed: {e}")
+        return False
+def test_gpu_availability():
+    """Test GPU availability"""
+    logger.info("Testing GPU availability...")
+    if torch.cuda.is_available():
+        logger.info(f"✓ GPU available: {torch.cuda.get_device_name(0)}")
+        logger.info(f"✓ CUDA version: {torch.version.cuda}")
+        logger.info(f"✓ GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
+        return True
+    else:
+        logger.warning("⚠ No GPU available, will use CPU")
+        return True
+def test_model_loading():
+    """Test model loading (without downloading)"""
+    logger.info("Testing model loading...")
+    try:
+        from transformers import AutoTokenizer, AutoConfig
+        # Test tokenizer loading
+        tokenizer = AutoTokenizer.from_pretrained(
+            "HuggingFaceTB/SmolLM3-3B",
+            trust_remote_code=True,
+            use_fast=True
+        )
+        logger.info(f"✓ Tokenizer loaded, vocab size: {tokenizer.vocab_size}")
+        # Test config loading
+        config = AutoConfig.from_pretrained(
+            "HuggingFaceTB/SmolLM3-3B",
+            trust_remote_code=True
+        )
+        logger.info(f"✓ Config loaded, model type: {config.model_type}")
+        return True
+    except Exception as e:
+        logger.error(f"✗ Model loading test failed: {e}")
+        return False
+def main():
+    """Run all tests"""
+    logger.info("Starting SmolLM3 setup tests...")
+    tests = [
+        ("Import Tests", test_imports),
+        ("Local Import Tests", test_local_imports),
+        ("Configuration Tests", test_config),
+        ("Dataset Creation Tests", test_dataset_creation),
+        ("GPU Availability Tests", test_gpu_availability),
+        ("Model Loading Tests", test_model_loading),
+    ]
+    passed = 0
+    total = len(tests)
+    for test_name, test_func in tests:
+        logger.info(f"\n{'='*50}")
+        logger.info(f"Running: {test_name}")
+        logger.info('='*50)
+        try:
+            if test_func():
+                passed += 1
+                logger.info(f"✓ {test_name} PASSED")
+            else:
+                logger.error(f"✗ {test_name} FAILED")
+        except Exception as e:
+            logger.error(f"✗ {test_name} FAILED with exception: {e}")
+    logger.info(f"\n{'='*50}")
+    logger.info(f"Test Results: {passed}/{total} tests passed")
+    logger.info('='*50)
+    if passed == total:
+        logger.info("🎉 All tests passed! Setup is ready for SmolLM3 fine-tuning.")
+        return 0
+    else:
+        logger.error("❌ Some tests failed. Please check the errors above.")
+        return 1
+if __name__ == '__main__':
+    sys.exit(main())

train.py ADDED Viewed

	@@ -0,0 +1,144 @@

+#!/usr/bin/env python3
+"""
+SmolLM3 Fine-tuning Script for FlexAI Console
+Based on the nanoGPT structure but adapted for SmolLM3 model
+"""
+import os
+import sys
+import argparse
+import json
+import torch
+import logging
+from pathlib import Path
+from typing import Optional, Dict, Any
+# Add the current directory to the path for imports
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+from config import get_config
+from model import SmolLM3Model
+from data import SmolLM3Dataset
+from trainer import SmolLM3Trainer
+def setup_logging():
+    """Setup logging configuration"""
+    logging.basicConfig(
+        level=logging.INFO,
+        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+        handlers=[
+            logging.StreamHandler(sys.stdout),
+            logging.FileHandler('training.log')
+        ]
+    )
+    return logging.getLogger(__name__)
+def parse_args():
+    """Parse command line arguments"""
+    parser = argparse.ArgumentParser(description='SmolLM3 Fine-tuning Script')
+    # Configuration file
+    parser.add_argument('config', type=str, help='Path to configuration file')
+    # Dataset arguments
+    parser.add_argument('--dataset_dir', type=str, default='my_dataset',
+                       help='Path to dataset directory within /input')
+    # Checkpoint arguments
+    parser.add_argument('--out_dir', type=str, default='/output-checkpoint',
+                       help='Output directory for checkpoints')
+    parser.add_argument('--init_from', type=str, default='scratch',
+                       choices=['scratch', 'resume', 'pretrained'],
+                       help='Initialization method')
+    # Training arguments
+    parser.add_argument('--max_iters', type=int, default=None,
+                       help='Maximum number of training iterations')
+    parser.add_argument('--batch_size', type=int, default=None,
+                       help='Batch size for training')
+    parser.add_argument('--learning_rate', type=float, default=None,
+                       help='Learning rate')
+    parser.add_argument('--gradient_accumulation_steps', type=int, default=None,
+                       help='Gradient accumulation steps')
+    # Model arguments
+    parser.add_argument('--model_name', type=str,
+                       default='HuggingFaceTB/SmolLM3-3B',
+                       help='Model name or path')
+    parser.add_argument('--max_seq_length', type=int, default=4096,
+                       help='Maximum sequence length')
+    # Logging and saving
+    parser.add_argument('--save_steps', type=int, default=500,
+                       help='Save checkpoint every N steps')
+    parser.add_argument('--eval_steps', type=int, default=100,
+                       help='Evaluate every N steps')
+    parser.add_argument('--logging_steps', type=int, default=10,
+                       help='Log every N steps')
+    return parser.parse_args()
+def main():
+    """Main training function"""
+    args = parse_args()
+    logger = setup_logging()
+    logger.info("Starting SmolLM3 fine-tuning...")
+    logger.info(f"Arguments: {vars(args)}")
+    # Load configuration
+    config = get_config(args.config)
+    # Override config with command line arguments
+    if args.max_iters is not None:
+        config.max_iters = args.max_iters
+    if args.batch_size is not None:
+        config.batch_size = args.batch_size
+    if args.learning_rate is not None:
+        config.learning_rate = args.learning_rate
+    if args.gradient_accumulation_steps is not None:
+        config.gradient_accumulation_steps = args.gradient_accumulation_steps
+    # Setup paths
+    dataset_path = os.path.join('/input', args.dataset_dir)
+    output_path = args.out_dir
+    # Ensure output directory exists
+    os.makedirs(output_path, exist_ok=True)
+    logger.info(f"Dataset path: {dataset_path}")
+    logger.info(f"Output path: {output_path}")
+    # Initialize model
+    model = SmolLM3Model(
+        model_name=args.model_name,
+        max_seq_length=args.max_seq_length,
+        config=config
+    )
+    # Load dataset
+    dataset = SmolLM3Dataset(
+        data_path=dataset_path,
+        tokenizer=model.tokenizer,
+        max_seq_length=args.max_seq_length
+    )
+    # Initialize trainer
+    trainer = SmolLM3Trainer(
+        model=model,
+        dataset=dataset,
+        config=config,
+        output_dir=output_path,
+        init_from=args.init_from
+    )
+    # Start training
+    try:
+        trainer.train()
+        logger.info("Training completed successfully!")
+    except Exception as e:
+        logger.error(f"Training failed: {e}")
+        raise
+if __name__ == '__main__':
+    main()

trainer.py ADDED Viewed

	@@ -0,0 +1,242 @@

+"""
+SmolLM3 Trainer
+Handles the training loop and integrates with Hugging Face Trainer
+"""
+import os
+import torch
+import logging
+from typing import Optional, Dict, Any
+from transformers import Trainer, TrainingArguments
+from trl import SFTTrainer
+import json
+logger = logging.getLogger(__name__)
+class SmolLM3Trainer:
+    """Trainer for SmolLM3 fine-tuning"""
+    def __init__(
+        self,
+        model,
+        dataset,
+        config,
+        output_dir: str,
+        init_from: str = "scratch",
+        use_sft_trainer: bool = True
+    ):
+        self.model = model
+        self.dataset = dataset
+        self.config = config
+        self.output_dir = output_dir
+        self.init_from = init_from
+        self.use_sft_trainer = use_sft_trainer
+        # Setup trainer
+        self.trainer = self._setup_trainer()
+    def _setup_trainer(self):
+        """Setup the trainer"""
+        logger.info("Setting up trainer")
+        # Get training arguments
+        training_args = self.model.get_training_arguments(
+            output_dir=self.output_dir,
+            save_steps=self.config.save_steps,
+            eval_steps=self.config.eval_steps,
+            logging_steps=self.config.logging_steps,
+            max_steps=self.config.max_iters,
+        )
+        # Get datasets
+        train_dataset = self.dataset.get_train_dataset()
+        eval_dataset = self.dataset.get_eval_dataset()
+        # Get data collator
+        data_collator = self.dataset.get_data_collator()
+        if self.use_sft_trainer:
+            # Use SFTTrainer for supervised fine-tuning
+            trainer = SFTTrainer(
+                model=self.model.model,
+                tokenizer=self.model.tokenizer,
+                train_dataset=train_dataset,
+                eval_dataset=eval_dataset,
+                args=training_args,
+                data_collator=data_collator,
+                dataset_text_field="text",
+                max_seq_length=self.config.max_seq_length,
+                packing=False,  # Disable packing for better control
+            )
+        else:
+            # Use standard Trainer
+            trainer = Trainer(
+                model=self.model.model,
+                tokenizer=self.model.tokenizer,
+                args=training_args,
+                train_dataset=train_dataset,
+                eval_dataset=eval_dataset,
+                data_collator=data_collator,
+            )
+        return trainer
+    def load_checkpoint(self, checkpoint_path: str):
+        """Load checkpoint for resuming training"""
+        logger.info(f"Loading checkpoint from {checkpoint_path}")
+        if self.init_from == "resume":
+            # Load the model from checkpoint
+            self.model.load_checkpoint(checkpoint_path)
+            # Update trainer with loaded model
+            self.trainer.model = self.model.model
+            logger.info("Checkpoint loaded successfully")
+        elif self.init_from == "pretrained":
+            # Model is already loaded from pretrained
+            logger.info("Using pretrained model")
+        else:
+            logger.info("Starting from scratch")
+    def train(self):
+        """Start training"""
+        logger.info("Starting training")
+        # Load checkpoint if resuming
+        if self.init_from == "resume":
+            checkpoint_path = "/input-checkpoint"
+            if os.path.exists(checkpoint_path):
+                self.load_checkpoint(checkpoint_path)
+            else:
+                logger.warning(f"Checkpoint path {checkpoint_path} not found, starting from scratch")
+        # Start training
+        try:
+            train_result = self.trainer.train()
+            # Save the final model
+            self.trainer.save_model()
+            # Save training results
+            with open(os.path.join(self.output_dir, "train_results.json"), "w") as f:
+                json.dump(train_result.metrics, f, indent=2)
+            logger.info("Training completed successfully!")
+            logger.info(f"Training metrics: {train_result.metrics}")
+        except Exception as e:
+            logger.error(f"Training failed: {e}")
+            raise
+    def evaluate(self):
+        """Evaluate the model"""
+        logger.info("Starting evaluation")
+        try:
+            eval_results = self.trainer.evaluate()
+            # Save evaluation results
+            with open(os.path.join(self.output_dir, "eval_results.json"), "w") as f:
+                json.dump(eval_results, f, indent=2)
+            logger.info(f"Evaluation completed: {eval_results}")
+            return eval_results
+        except Exception as e:
+            logger.error(f"Evaluation failed: {e}")
+            raise
+    def save_model(self, path: Optional[str] = None):
+        """Save the trained model"""
+        save_path = path or self.output_dir
+        logger.info(f"Saving model to {save_path}")
+        try:
+            self.trainer.save_model(save_path)
+            self.model.tokenizer.save_pretrained(save_path)
+            # Save training configuration
+            if self.config:
+                config_dict = {k: v for k, v in self.config.__dict__.items()
+                              if not k.startswith('_')}
+                with open(os.path.join(save_path, 'training_config.json'), 'w') as f:
+                    json.dump(config_dict, f, indent=2, default=str)
+            logger.info("Model saved successfully!")
+        except Exception as e:
+            logger.error(f"Failed to save model: {e}")
+            raise
+class SmolLM3DPOTrainer:
+    """DPO Trainer for SmolLM3 preference optimization"""
+    def __init__(
+        self,
+        model,
+        dataset,
+        config,
+        output_dir: str,
+        ref_model=None
+    ):
+        self.model = model
+        self.dataset = dataset
+        self.config = config
+        self.output_dir = output_dir
+        self.ref_model = ref_model
+        # Setup DPO trainer
+        self.trainer = self._setup_dpo_trainer()
+    def _setup_dpo_trainer(self):
+        """Setup DPO trainer"""
+        from trl import DPOTrainer
+        # Get training arguments
+        training_args = self.model.get_training_arguments(
+            output_dir=self.output_dir,
+            save_steps=self.config.save_steps,
+            eval_steps=self.config.eval_steps,
+            logging_steps=self.config.logging_steps,
+            max_steps=self.config.max_iters,
+        )
+        # Get preference dataset
+        train_dataset = self.dataset.get_train_dataset()
+        eval_dataset = self.dataset.get_eval_dataset()
+        # Setup DPO trainer
+        trainer = DPOTrainer(
+            model=self.model.model,
+            ref_model=self.ref_model,
+            args=training_args,
+            train_dataset=train_dataset,
+            eval_dataset=eval_dataset,
+            tokenizer=self.model.tokenizer,
+            max_prompt_length=self.config.max_seq_length // 2,
+            max_length=self.config.max_seq_length,
+        )
+        return trainer
+    def train(self):
+        """Start DPO training"""
+        logger.info("Starting DPO training")
+        try:
+            train_result = self.trainer.train()
+            # Save the final model
+            self.trainer.save_model()
+            # Save training results
+            with open(os.path.join(self.output_dir, "dpo_train_results.json"), "w") as f:
+                json.dump(train_result.metrics, f, indent=2)
+            logger.info("DPO training completed successfully!")
+            logger.info(f"Training metrics: {train_result.metrics}")
+        except Exception as e:
+            logger.error(f"DPO training failed: {e}")
+            raise