|
# helios-314b-alpha |
|
|
|
This repository contains JAX example code for loading and running the Helios-314B-Alpha open-weights model. |
|
|
|
The Helios-314B-Alpha model is a trained version of the Grok-V1 open source model released by X.AI Corp.<br /> |
|
We have fine-tuned the model to perform on crypto-related queries.<br /> |
|
It achieves the following results on the evaluation set:<br /><br /> |
|
Loss: 0.0052<br /> |
|
F1: 0.9969 |
|
|
|
Make sure to download the checkpoint and place the `ckpt-0` directory in `checkpoints` |
|
|
|
Then, run |
|
|
|
```shell |
|
pip install -r requirements.txt |
|
python run.py |
|
``` |
|
|
|
to test the code. |
|
|
|
The script loads the checkpoint and samples from the model on a test input. |
|
|
|
Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code. |
|
The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model. |
|
|
|
# Model Specifications |
|
|
|
Helios is currently designed with the following specifications: |
|
|
|
- **Parameters:** 314B |
|
- **Architecture:** Mixture of 8 Experts (MoE) |
|
- **Experts Utilization:** 2 experts used per token |
|
- **Layers:** 64 |
|
- **Attention Heads:** 48 for queries, 8 for keys/values |
|
- **Embedding Size:** 6,144 |
|
- **Tokenization:** SentencePiece tokenizer with 131,072 tokens |
|
- **Additional Features:** |
|
- Rotary embeddings (RoPE) |
|
- Supports activation sharding and 8-bit quantization |
|
- **Maximum Sequence Length (context):** 8,192 tokens |
|
|
|
# License |
|
|
|
The code and weights for the Helios-314B-Alpha model are licensed under the apache-2.0 open source license |