rdiehlmartinez commited on
Commit
066e23e
·
verified ·
1 Parent(s): cb976f4

Updating model sizes and disclaimer

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -32,14 +32,17 @@ This HuggingFace organization hosts our pre-trained models and datasets, while t
32
  ### **1. Pre-trained Model Suite**
33
 
34
  Our complete suite of models from 10M to 500M parameters trained with Pico:
35
- - [**pico-decoder-tiny**](https://huggingface.co/pico-lm/pico-decoder-tiny) (10M parameters)
36
- - [**pico-decoder-small**](https://huggingface.co/pico-lm/pico-decoder-small) (50M parameters)
37
- - [**pico-decoder-medium**](https://huggingface.co/pico-lm/pico-decoder-medium) (200M parameters)
38
- - [**pico-decoder-large**](https://huggingface.co/pico-lm/pico-decoder-large) (500M parameters)
39
 
40
- > 🚧 **Coming Soon!** **pico-decoder-xl** (1B parameters) Watch this space or star our [GitHub repository](https://github.com/pico-lm) for updates!
 
 
41
 
42
- All models are trained for 50,000 steps on the [**pretokenized-dolma**](https://huggingface.co/datasets/pico-lm/pretokenized-dolma) dataset (corresponding to 100B tokens). They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.
 
43
 
44
  In each model repository, we version control checkpoints every 1000 steps that contain:
45
  - Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
 
32
  ### **1. Pre-trained Model Suite**
33
 
34
  Our complete suite of models from 10M to 500M parameters trained with Pico:
35
+ - [**pico-decoder-tiny**](https://huggingface.co/pico-lm/pico-decoder-tiny) (11M parameters)
36
+ - [**pico-decoder-small**](https://huggingface.co/pico-lm/pico-decoder-small) (65M parameters)
37
+ - [**pico-decoder-medium**](https://huggingface.co/pico-lm/pico-decoder-medium) (181M parameters)
38
+ - [**pico-decoder-large**](https://huggingface.co/pico-lm/pico-decoder-large) (570M parameters)
39
 
40
+ > 🚧 **Disclaimer** These models are still under construction. The models released in this repository have been trained for 50,000 steps (corresponding to 100B tokens). Training will finalize after 200,000 steps.
41
+ >
42
+ > 🚧 **Coming Soon!** **pico-decoder-xl** (1B+ parameters) Watch this space or star our [GitHub repository](https://github.com/pico-lm) for updates!
43
 
44
+
45
+ All models are on the [**pretokenized-dolma**](https://huggingface.co/datasets/pico-lm/pretokenized-dolma) dataset. They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.
46
 
47
  In each model repository, we version control checkpoints every 1000 steps that contain:
48
  - Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)