license: apache-2.0
Grok-1
This repository contains the weights of the Grok-1 open-weights model.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.
ββββββββββββββββββββββββββββ
β _______ β
β /\ |_ _| β
β __ __ / \ | | β
β \ \/ / / /\ \ | | β
β > < / ____ \ _| |_ β
β /_/\_\/_/ \_\_____| β
β β
β Understand the Universe β
β [https://x.ai] β
ββββββββββββββββββββββββββββ
βββββββββββββββββββββ
β xAI Grok-1 (314B) β
βββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββ
β 314B parameter Mixture of Experts model β
β - Base model (not finetuned) β
β - 8 experts (2 active) β
β - 86B active parameters β
β - Apache 2.0 license β
β - Code: https://github.com/xai-org/grok-1 β
β - Happy coding! β
ββββββββββββββββββββββββββββββββββββββββββββββ
Model Configuration Details
Vocabulary Size: 131,072
Special Tokens:
- Pad Token: 0
- End of Sequence Token: 2
Sequence Length: 8192
Model Architecture: MoE
- Embedding Size: 6,144
- Rotary Embedding (RoPE)
- Layers: 64
- Experts: 8
- Selected Experts: 2
- Widening Factor: 8
- Key Size: 128
- Query Heads: 48
- Key Value Heads: 8
- Activation Sharding: Data-wise, Model-wise
- Tokenizer : SentencePiece tokenizer
Inference Configuration:
- Batch Size per Device: 0.125
- Tokenizer:
./tokenizer.model
- Local Mesh: 1x8
- Between Hosts: 1x1
Inference Details
Make sure to download the int8
checkpoint to the checkpoints
directory and run
pip install -r requirements.txt
python transformer.py
to test the code.
You should be seeing output from the language model.
Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
p.s. we're hiring: https://x.ai/careers