Week2LLMFineTune - ORPO-Trained GPT-2

This model is a fine-tuned version of openai-community/gpt2 using ORPO (Odds Ratio Preference Optimization) training on the ORPO-DPO-Mix-40k dataset.

Model Details

  • Base Model: GPT-2
  • Training Method: ORPO (Odds Ratio Preference Optimization)
  • Dataset Size: 40k examples
  • Context Length: 512 tokens
  • Training Hardware:
    • 2x 3090 RTX GPU Setup
    • RAM 128GB
    • CPU AMD Ryzen 9 5900X 12-Core Processor

Training Parameters

Training Arguments:

  • Learning Rate: 2e-5
  • Batch Size: 4
  • Epochs: 1
  • Block Size: 512
  • Warmup Ratio: 0.1
  • Weight Decay: 0.01
  • Gradient Accumulation: 4
  • Mixed Precision: bf16

LoRA Configuration:

  • R: 16
  • Alpha: 32
  • Dropout: 0.05

Intended Use

This model is designed for:

  • General text generation tasks
  • Conversational AI applications
  • Text completion with preference alignment

Training Approach

he model was trained using ORPO, which combines:

  • Supervised Fine-Tuning (SFT)
  • Preference Optimization
  • Efficient LoRA adaptation

Evaluation Metrics

The model has been evaluated on various benchmarks but results are pending publication.
  • Limitations
    • Limited by base model architecture (GPT-2)
    • Training dataset size constraints
    • Context length limited to 512 tokens
    • Inherits base model biases

Evaluation Results

The model has been evaluated on multiple benchmarks with the following results:

HellaSwag

Metric Value Stderr
acc 0.2906 ±0.0045
acc_norm 0.3126 ±0.0046

TinyMMLU

Metric Value Stderr
acc_norm 0.3152 N/A

ARC Easy

Metric Value Stderr
acc 0.4116 ±0.0101
acc_norm 0.3910 ±0.0100

All evaluations were performed with the following settings:

  • Number of few-shot examples: 0 (zero-shot)
  • Device: CUDA
  • Batch size: 1
  • Model type: GPT-2 with ORPO fine-tuning

These results demonstrate the model's capabilities across different tasks:

  • Common sense reasoning (HellaSwag)
  • Multi-task knowledge (TinyMMLU)
  • Grade-school level reasoning (ARC Easy)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Decepticore/Week2LLMFineTune

Finetuned
(1268)
this model

Dataset used to train Decepticore/Week2LLMFineTune

Evaluation results