GPT-X Model

This model was trained using the GPT-X framework.

Model Architecture

  • Layers: 12
  • Attention Heads: 12
  • Hidden Size: 768
  • Vocabulary Size: 50257
  • Maximum Sequence Length: 1024
  • Model Type: base

Training Details

  • Batch Size: 524288
  • Learning Rate: 0.0006
  • Weight Decay: 0.0
  • Mixed Precision: True
  • Optimizer: muon
Downloads last month
31
Inference Examples
Unable to determine this model's library. Check the docs .