Add model card for SAEBench (#1)
Browse files- Add model card for SAEBench (69058972fe9df66859a292edf4b8c525134672ed)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: feature-extraction
|
5 |
+
---
|
6 |
+
|
7 |
+
# SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
|
8 |
+
|
9 |
+
This repository contains models described in the paper [SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability](https://huggingface.co/papers/2503.09532). SAEBench is a comprehensive evaluation suite that measures SAE performance across seven diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning.
|
10 |
+
|
11 |
+
* Project Page: [https://saebench.xyz](https://saebench.xyz)
|
12 |
+
* Code: [https://github.com/adamkarvonen/SAEBench](https://github.com/adamkarvonen/SAEBench)
|