AxBench Release Collection Open supervised dictionary learning models and datasets for Gemma 2 2B and 9B instruction-tuned models. • 13 items • Updated 30 days ago • 5
AxBench Release Collection Open supervised dictionary learning models and datasets for Gemma 2 2B and 9B instruction-tuned models. • 13 items • Updated 30 days ago • 5
AxBench Release Collection Open supervised dictionary learning models and datasets for Gemma 2 2B and 9B instruction-tuned models. • 13 items • Updated 30 days ago • 5
Rigorously Assessing Natural Language Explanations of Neurons Paper • 2309.10312 • Published Sep 19, 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions Paper • 2305.14795 • Published May 24, 2023
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Paper • 2401.12631 • Published Jan 23, 2024
ReFT: Representation Finetuning for Language Models Paper • 2404.03592 • Published Apr 4, 2024 • 94
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions Paper • 2403.07809 • Published Mar 12, 2024 • 1