Aryaman Arora's picture

Aryaman Arora

aryaman

AI & ML interests

AI, mechanistic interpretability, South Asian languages

Recent Activity

liked a Space 10 days ago
pyvene/reft_golden_gate_bridge_llama3
liked a Space 10 days ago
pyvene/AxBench-ReFT-r1-16K
upvoted a collection 19 days ago
AxBench Release
View all activity

Organizations

Spaces-explorers's profile picture pyvene's profile picture

aryaman's activity

reacted to gsarti's post with ā¤ļø 10 months ago
view post
Post
2162
šŸ” Today's pick in Interpretability & Analysis of LMs: ReFT: Representation Finetuning for Language Models by @zhengxuanzenwu @aryaman Z. Wang @atticusg D. Jurafsky @manning @cgpotts

This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The šŸ¤—-compatible pyreft library is introduced to simplify ReFT usage.

This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!

šŸ“„ Paper: ReFT: Representation Finetuning for Language Models (2404.03592)

šŸ” All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9