Update README.md
Browse files
README.md
CHANGED
@@ -53,4 +53,17 @@ Notes:
|
|
53 |
- Excepting activation normalization.
|
54 |
- We increased the learning rate here by one order of magnitude in order to explore whether this resulted in faster training (in particular, a lower L0 more quickly)
|
55 |
- We find in practice that the drop in L0 is accelerated but this results is significantly more dead features (likely causing worse reconstruction)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
- As above, it is likely under-trained.
|
|
|
53 |
- Excepting activation normalization.
|
54 |
- We increased the learning rate here by one order of magnitude in order to explore whether this resulted in faster training (in particular, a lower L0 more quickly)
|
55 |
- We find in practice that the drop in L0 is accelerated but this results is significantly more dead features (likely causing worse reconstruction)
|
56 |
+
- As above, it is likely under-trained.
|
57 |
+
|
58 |
+
## Resid Post 12
|
59 |
+
|
60 |
+
Stats:
|
61 |
+
- 16384 Features (expansion factor 8) achieving a CE Loss score of
|
62 |
+
- CE Loss score of 95.99% (2.563 without SAE, 2.96 with the SAE)
|
63 |
+
- Mean L0 52 (in practice L0 is log normal distributed and is heavily right tailed).
|
64 |
+
- Dead Features: Less than 200 dead features.
|
65 |
+
|
66 |
+
Notes:
|
67 |
+
- This SAE was trained with methods from the Anthropic [April Update](https://transformer-circuits.pub/2024/april-update/index.html#training-saes)
|
68 |
+
- **With activation normalization**. This means that activations should be multiplied by a constant such that E(|X|) = sqrt(2048)
|
69 |
- As above, it is likely under-trained.
|