pico-lm
/

pico-decoder-tiny

Text Generation

Model card Files Files and versions Community

pico-decoder-tiny / README.md

rdiehlmartinez's picture

pico-decoder-tiny-1 trained to 125k steps

32c7549 12 days ago

|

history blame contribute delete

2.31 kB

	---
	datasets:
	- pico-lm/pretokenized-dolma
	language:
	- en
	license: apache-2.0
	metrics:
	- pico-lm/perplexity
	pipeline_tag: text-generation
	---

	# Pico Decoder Tiny

	pico-decoder-tiny is the smallest (11M) model in the `pico-decoder` suite — a lightweight, LLaMA-style decoder-only transformer trained from scratch using [`pico-train`](https://github.com/pico-lm/pico-train). It is designed for transparent and reproducible research into the learning dynamics of language models, and is fully compatible with the `pico-analyze` toolkit for detailed interpretability analysis.

	> NOTE: The `pico-decoder-tiny-1` branch contains the full commit history for the training run.

	## 🔧 Model Details

	\| Field \| Value \|
	\|---------------------\|------------------------------------\|
	\| Architecture \| Decoder-only transformer (LLaMA-style) \|
	\| Parameters \| 11M \|
	\| Layers \| 12 \|
	\| Hidden Size \| 96 \|
	\| Feed Foward Size \| 384 \|
	\| Attention Heads \| 12 \|
	\| Key/Value Heads \| 4 \|

	## 📚 Training

	- Dataset: [`pretokenized-dolma`](https://huggingface.co/datasets/pico-lm/pretokenized-dolma), English-only
	- Training steps: 200,000
	- Batch size: 1024
	- Sequence length: 2048
	- Optimizer: AdamW
	- Learning rate schedule: Linear decay with warmup
	- Compute: 16 A100-SXM4-80GB GPUs

	## 📈 Evaluation and Analysis

	This model supports fine-grained analysis using [`pico-analyze`](https://github.com/pico-lm/pico-analyze). This tool enables researchers to understand how learning unfolds over training, even at very small scales.

	We also evaluate perplexity of the model on the [`pico-paloma-tinsy`](https://huggingface.co/datasets/pico-lm/pretokenized-paloma-tinsy) dataset.

	## 📄 Citation

	If you use `pico-tiny` or any other `pico-decoder` model in your research, please cite:

	```bibtex
	@software{pico2025,
	author = {Diehl Martinez, Richard},
	title = {Pico: A Lightweight Framework for Studying Language Model Learning Dynamics},
	year = {2025,
	url = {https://github.com/pico-lm}
	}
	```