GPT-X Model
This model was trained using the GPT-X framework.
Model Architecture
- Layers: 12
- Attention Heads: 12
- Hidden Size: 768
- Vocabulary Size: 50257
- Maximum Sequence Length: 1024
- Model Type: base
Training Details
- Batch Size: 524288
- Learning Rate: 0.0006
- Weight Decay: 0.0
- Mixed Precision: True
- Optimizer: muon
- Downloads last month
- 31