Multi-Layer SAEs with Transformers
Collection
Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously, including the transformers.
•
34 items
•
Updated
A Multi-Layer Sparse Autoencoder (MLSAE) trained on the residual stream activation vectors from EleutherAI/pythia-1b-deduped with an expansion factor of R = 64 and sparsity k = 32, over 1 billion tokens from monology/pile-uncopyrighted.
This model is a PyTorch Lightning MLSAETransformer module, which includes the underlying transformer.
BibTeX:
@misc{lawson_residual_2024,
title = {Residual {{Stream Analysis}} with {{Multi-Layer SAEs}}},
author = {Lawson, Tim and Farnik, Lucy and Houghton, Conor and Aitchison, Laurence},
year = {2024},
month = oct,
number = {arXiv:2409.04185},
eprint = {2409.04185},
primaryclass = {cs},
publisher = {arXiv},
doi = {10.48550/arXiv.2409.04185},
urldate = {2024-10-08},
archiveprefix = {arXiv}
}
Base model
EleutherAI/pythia-1b-deduped