namgoodfire commited on
Commit
e8414ba
·
verified ·
1 Parent(s): c20b444

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -8,14 +8,18 @@ base_model:
8
 
9
  ## Model Information
10
 
11
- The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
12
- the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of 
 
13
  Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
14
  into interpretable features. The model is optimized for interpretability tasks and model steering applications,
15
  allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
16
  As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
17
  over large language model operations.
18
 
 
 
 
19
  ## Intended Use
20
 
21
  By open-sourcing SAEs for leading open models, especially large-scale
@@ -272,8 +276,6 @@ logits, kv_cache, features = llama_3_1_8b.forward(
272
  print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
273
  ```
274
 
275
- ## Training
276
-
277
  ## Responsibility & Safety
278
 
279
  Safety is at the core of everything we do at Goodfire. As a public benefit
 
8
 
9
  ## Model Information
10
 
11
+ The Goodfire SAE (Sparse Autoencoder) for [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
12
+ is an interpreter model designed to analyze and understand
13
+ the model's internal representations. This SAE model is trained specifically on layer 50 of 
14
  Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
15
  into interpretable features. The model is optimized for interpretability tasks and model steering applications,
16
  allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
17
  As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
18
  over large language model operations.
19
 
20
+ __Model Creator__: [meta-llama](https://huggingface.co/meta-llama)
21
+
22
+
23
  ## Intended Use
24
 
25
  By open-sourcing SAEs for leading open models, especially large-scale
 
276
  print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
277
  ```
278
 
 
 
279
  ## Responsibility & Safety
280
 
281
  Safety is at the core of everything we do at Goodfire. As a public benefit