GPT-X Model

This model was trained using the GPT-X framework.

Model Architecture

  • Layers: 12
  • Attention Heads: 12
  • Hidden Size: 768
  • Vocabulary Size: 50257
  • Maximum Sequence Length: 1024
  • Model Type: base

Training Details

  • Batch Size: 524288
  • Learning Rate: 0.0006
  • Weight Decay: 0.0
  • Mixed Precision: True
  • Optimizer: muon
Downloads last month
8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.