Rigorously Assessing Natural Language Explanations of Neurons Paper • 2309.10312 • Published Sep 19, 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions Paper • 2305.14795 • Published May 24, 2023
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Paper • 2401.12631 • Published Jan 23, 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions Paper • 2403.07809 • Published Mar 12, 2024 • 1
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior Paper • 2205.14140 • Published May 27, 2022
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Paper • 2310.03714 • Published Oct 5, 2023 • 33
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Paper • 2305.08809 • Published May 15, 2023 • 2
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Paper • 2305.08809 • Published May 15, 2023 • 2
CEBaB/bert-base-uncased.CEBaB.causalm.service__food.2-class.exclusive.seed_42 Updated May 24, 2022 • 2
CEBaB/bert-base-uncased.CEBaB.causalm.food__service.2-class.exclusive.seed_42 Updated May 24, 2022 • 5
CEBaB/bert-base-uncased.CEBaB.causalm.ambiance__food.2-class.exclusive.seed_42 Updated May 24, 2022 • 2