stefanhex-apollo
commited on
Commit
•
5d176c5
1
Parent(s):
10cba8c
Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,11 @@ The final LayerNorm also has 1e12 as epsilon, but non-unity weights and biases.
|
|
19 |
thus the LN parameters cannot be folded into that matrix. You can completely remove all LNs by simply replacing `ln_1` and `ln_2` modules with identities, and replacing
|
20 |
`ln_f` with modifications to the unembed matrix and unembed bias.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
22 |
## TransformerLens loading code
|
23 |
```python
|
24 |
import torch
|
|
|
19 |
thus the LN parameters cannot be folded into that matrix. You can completely remove all LNs by simply replacing `ln_1` and `ln_2` modules with identities, and replacing
|
20 |
`ln_f` with modifications to the unembed matrix and unembed bias.
|
21 |
|
22 |
+
You can load the model with `transformers`, or one of the interpretability libraries listed below.
|
23 |
+
```python
|
24 |
+
model = GPT2LMHeadModel.from_pretrained("apollo-research/gpt2_noLN").to("cpu")
|
25 |
+
```
|
26 |
+
|
27 |
## TransformerLens loading code
|
28 |
```python
|
29 |
import torch
|