Model Card for tim-lawson/mlsae-pythia-160m-deduped-x64-k128-tfm

A Multi-Layer Sparse Autoencoder (MLSAE) trained on the residual stream activation vectors from EleutherAI/pythia-160m-deduped with an expansion factor of R = 64 and sparsity k = 128, over 1 billion tokens from monology/pile-uncopyrighted.

This model is a PyTorch Lightning MLSAETransformer module, which includes the underlying transformer.

Model Sources

Repository: https://github.com/tim-lawson/mlsae
Paper: https://arxiv.org/abs/2409.04185
Weights & Biases: https://wandb.ai/timlawson-/mlsae

Citation

BibTeX:

@misc{lawson_residual_2024,
  title         = {Residual {{Stream Analysis}} with {{Multi-Layer SAEs}}},
  author        = {Lawson, Tim and Farnik, Lucy and Houghton, Conor and Aitchison, Laurence},
  year          = {2024},
  month         = oct,
  number        = {arXiv:2409.04185},
  eprint        = {2409.04185},
  primaryclass  = {cs},
  publisher     = {arXiv},
  doi           = {10.48550/arXiv.2409.04185},
  urldate       = {2024-10-08},
  archiveprefix = {arXiv}
}

tim-lawson
/

mlsae-pythia-160m-deduped-x64-k128-tfm

Model Card for tim-lawson/mlsae-pythia-160m-deduped-x64-k128-tfm

Model Sources

Citation

Model tree for tim-lawson/mlsae-pythia-160m-deduped-x64-k128-tfm

Collection including tim-lawson/mlsae-pythia-160m-deduped-x64-k128-tfm

Multi-Layer SAEs with Transformers