jbloom commited on
Commit
5c639de
1 Parent(s): 46dc3f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Gemma 2b Residual Stream SAEs.
6
+
7
+ This is a "quick and dirty" SAE release to unblock researchers. These SAEs have not been extensively studied or characterized.
8
+ However, I will try to update the readme here when I add SAEs here to reflect what I know about them.
9
+
10
+ These SAEs were trained with [SAE Lens](https://github.com/jbloomAus/SAELens) and the library version is stored in the cfg.json.
11
+
12
+ All training hyperparameters are specified in cfg.json.
13
+
14
+ They are loadable using SAE via a few methods. A method that currently works (but may be replaced shortly by a more convenient method) would be the following:
15
+
16
+ ```python
17
+ import torch
18
+ from sae_lens.training.session_loader import LMSparseAutoencoderSessionloader
19
+
20
+ torch.set_grad_enabled(False)
21
+ path = "path/to/folder_containing_cfgjson_and_safetensors_file"
22
+ model, sae, activation_store = LMSparseAutoencoderSessionloader.load_pretrained_sae(
23
+ path, device = "cuda",
24
+ )
25
+ ```
26
+
27
+ ## Resid Post 0
28
+
29
+ Stats:
30
+ - 16384 Features (expansion factor 8)
31
+ - CE Loss score of 99.1% (2.647 without SAE, 2.732 with the SAE)
32
+ - Mean L0 54 (in practice L0 is log normal distributed and is heavily right tailed).
33
+ - Dead Features: We think this SAE may have ~2.5k dead features.
34
+
35
+ Notes:
36
+ - This SAE was trained with methods from the Anthropic [April Update](https://transformer-circuits.pub/2024/april-update/index.html#training-saes) excepting activation normalization.
37
+ - It is likely under-trained.
38
+
39
+
40
+ ## Resid Post 6
41
+
42
+ Stats:
43
+ - 16384 Features (expansion factor 8) achieving a CE Loss score of
44
+ - CE Loss score of 95.33% (2.647 without SAE, 3.103 with the SAE)
45
+ - Mean L0 53 (in practice L0 is log normal distributed and is heavily right tailed).
46
+ - Dead Features: We think this SAE may have up to 7k dead features.
47
+
48
+ Notes:
49
+ - This SAE was trained with methods from the Anthropic [April Update](https://transformer-circuits.pub/2024/april-update/index.html#training-saes)
50
+ - Excepting activation normalization.
51
+ - We increased the learning rate here by one order of magnitude in order to explore whether this resulted in faster training (in particular, a lower L0 more quickly)
52
+ - We find in practice that the drop in L0 is accelerated but this results is significantly more dead features (likely causing worse reconstruction)
53
+ - As above, it is likely under-trained.