Goodfire
/

Llama-3.3-70B-Instruct-SAE-l50

goodfire-llama-3.3-70b-instruct-sae-l50

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on Jan 9

Commit

e8414ba

·

verified ·

1 Parent(s): c20b444

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -8,14 +8,18 @@ base_model:
 ## Model Information
-The Goodfire SAE (Sparse Autoencoder) for Llama 3.3 70B is an interpreter model designed to analyze and understand
-the internal representations of Llama-3.3-70B-Instruct. This SAE model is trained specifically on layer 50 of
 Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
 over large language model operations.
 ## Intended Use
 By open-sourcing SAEs for leading open models, especially large-scale
@@ -272,8 +276,6 @@ logits, kv_cache, features = llama_3_1_8b.forward(
 print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
 ```
-## Training
 ## Responsibility & Safety
 Safety is at the core of everything we do at Goodfire. As a public benefit

 ## Model Information
+The Goodfire SAE (Sparse Autoencoder) for [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
+is an interpreter model designed to analyze and understand
+the model's internal representations. This SAE model is trained specifically on layer 50 of
 Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control
 over large language model operations.
+__Model Creator__: [meta-llama](https://huggingface.co/meta-llama)
 ## Intended Use
 By open-sourcing SAEs for leading open models, especially large-scale
 print(llama_3_1_8b.tokenizer.decode(logits[-1].argmax(-1)))
 ```
 ## Responsibility & Safety
 Safety is at the core of everything we do at Goodfire. As a public benefit