adamkarvonen's picture
Add model card for SAEBench (#1)
e8ec978 verified
metadata
license: mit
library_name: transformers
pipeline_tag: feature-extraction

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

This repository contains models described in the paper SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability. SAEBench is a comprehensive evaluation suite that measures SAE performance across seven diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning.