gptx_test / README.md
Thunderbee's picture
Upload README.md with huggingface_hub
46c3dec verified
metadata
language: en
tags:
  - pytorch
  - gpt2
  - language-model
pipeline_tag: text-generation

GPT-X Model

This model was trained using the GPT-X framework.

Model Architecture

  • Layers: 12
  • Attention Heads: 12
  • Hidden Size: 768
  • Vocabulary Size: 50257
  • Maximum Sequence Length: 1024
  • Model Type: base

Training Details

  • Batch Size: 524288
  • Learning Rate: 0.0006
  • Weight Decay: 0.0
  • Mixed Precision: True
  • Optimizer: muon