fnlp
/

Llama-Scope

English

Model card Files Files and versions Community

Hzfinfdu commited on Oct 30, 2024

Commit

85f235a

verified ·

1 Parent(s): 69a4e61

Update README.md

Browse files

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- meta-llama/Llama-3.1-8B
+---
+# Llama Scope
+[**Technical Report Link**](https://arxiv.org/abs/2410.20526)
+[**Use with OpenMOSS lm_sae Github Repo**](https://github.com/OpenMOSS/Language-Model-SAEs)
+[**Use with SAELens**]
+[**Explore in Neuronpedia**]
+Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 improved TopK SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features.
+This is a frontpage of all Llama Scope SAEs. Please see the following link for checkpoints.
+## Naming Convention
+L[Layer][Position]-[Expansion]x
+For instance, an SAE with 8x the hidden size of Llama-3.1-8B, i.e. 32K features, trained on the 15th post-MLP residual stream is called L15R-8x.
+## Checkpoints
+[**LXR-8x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXR-8x/tree/main)
+[**LXA-8x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXA-8x/tree/main)
+[**LXM-8x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXM-8x/tree/main)
+[**LXTC-8x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXTC-8x/tree/main)
+[**LXR-32x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXR-32x/tree/main)
+[**LXA-32x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXA-32x/tree/main)
+[**LXM-32x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXM-32x/tree/main)
+[**LXTC-32x**](https://huggingface.co/fnlp/Llama3_1-8B-Base-LXTC-32x/tree/main)
+## Llama Scope SAE Overview
+<center>
+|                       | **Llama Scope**               | **Scaling Monosemanticity**    | **GPT-4 SAE**                    | **Gemma Scope**                   |
+|-----------------------|:-----------------------------:|:------------------------------:|:--------------------------------:|:---------------------------------:|
+| **Models**            | Llama-3.1 8B (Open Source)    | Claude-3.0 Sonnet (Proprietary) | GPT-4 (Proprietary)              | Gemma-2 2B & 9B (Open Source)     |
+| **SAE Training Data** | SlimPajama                    | Proprietary                     | Proprietary                      | Proprietary, Sampled from Mesnard et al. (2024) |
+| **SAE Position (Layer)** | Every Layer              | The Middle Layer                | 5/6 Late Layer     | Every Layer                       |
+| **SAE Position (Site)**  | R, A, M, TC              | R                               | R                                | R, A, M, TC                       |
+| **SAE Width (# Features)** | 32K, 128K              | 1M, 4M, 34M                     | 128K, 1M, 16M                    | 16K, 64K, 128K, 256K - 1M (Partial) |
+| **SAE Width (Expansion Factor)** | 8x, 32x       | Proprietary                     | Proprietary                      | 4.6x, 7.1x, 28.5x, 36.6x         |
+| **Activation Function** | TopK-ReLU                 | ReLU                            | TopK-ReLU                        | JumpReLU                          |
+</center>
+## Citation