Updating model sizes and disclaimer
Browse files
README.md
CHANGED
@@ -32,14 +32,17 @@ This HuggingFace organization hosts our pre-trained models and datasets, while t
|
|
32 |
### **1. Pre-trained Model Suite**
|
33 |
|
34 |
Our complete suite of models from 10M to 500M parameters trained with Pico:
|
35 |
-
- [**pico-decoder-tiny**](https://huggingface.co/pico-lm/pico-decoder-tiny) (
|
36 |
-
- [**pico-decoder-small**](https://huggingface.co/pico-lm/pico-decoder-small) (
|
37 |
-
- [**pico-decoder-medium**](https://huggingface.co/pico-lm/pico-decoder-medium) (
|
38 |
-
- [**pico-decoder-large**](https://huggingface.co/pico-lm/pico-decoder-large) (
|
39 |
|
40 |
-
> 🚧 **
|
|
|
|
|
41 |
|
42 |
-
|
|
|
43 |
|
44 |
In each model repository, we version control checkpoints every 1000 steps that contain:
|
45 |
- Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
|
|
|
32 |
### **1. Pre-trained Model Suite**
|
33 |
|
34 |
Our complete suite of models from 10M to 500M parameters trained with Pico:
|
35 |
+
- [**pico-decoder-tiny**](https://huggingface.co/pico-lm/pico-decoder-tiny) (11M parameters)
|
36 |
+
- [**pico-decoder-small**](https://huggingface.co/pico-lm/pico-decoder-small) (65M parameters)
|
37 |
+
- [**pico-decoder-medium**](https://huggingface.co/pico-lm/pico-decoder-medium) (181M parameters)
|
38 |
+
- [**pico-decoder-large**](https://huggingface.co/pico-lm/pico-decoder-large) (570M parameters)
|
39 |
|
40 |
+
> 🚧 **Disclaimer** These models are still under construction. The models released in this repository have been trained for 50,000 steps (corresponding to 100B tokens). Training will finalize after 200,000 steps.
|
41 |
+
>
|
42 |
+
> 🚧 **Coming Soon!** **pico-decoder-xl** (1B+ parameters) Watch this space or star our [GitHub repository](https://github.com/pico-lm) for updates!
|
43 |
|
44 |
+
|
45 |
+
All models are on the [**pretokenized-dolma**](https://huggingface.co/datasets/pico-lm/pretokenized-dolma) dataset. They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.
|
46 |
|
47 |
In each model repository, we version control checkpoints every 1000 steps that contain:
|
48 |
- Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
|