Aryaman Arora's picture

Aryaman Arora

aryaman

·

http://aryamanarora.github.io/

AI & ML interests

AI, mechanistic interpretability, South Asian languages

Recent Activity

liked a Space about 2 months ago

pyvene/reft_golden_gate_bridge_llama3

liked a Space about 2 months ago

pyvene/AxBench-ReFT-r1-16K

upvoted a collection about 2 months ago

AxBench Release

View all activity

Organizations

aryaman's activity

liked 2 Spaces about 2 months ago

ReFT-Golden-Gate-Bridge

Converse with an AI assistant that mimics the Golden Gate Bridge

SDL-ReFT-r1

Guide conversations with specific topics

upvoted a collection about 2 months ago

AxBench Release

Open supervised dictionary learning models and datasets for Gemma 2 2B and 9B instruction-tuned models. • 13 items • Updated about 1 month ago • 5

upvoted a paper 6 months ago

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20

liked a dataset 6 months ago

ScalingIntelligence/monkey_business

Viewer • Updated Sep 2, 2024 • 2.88k • 698 • 12

upvoted a collection 8 months ago

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized • 105 items • Updated about 22 hours ago • 97

upvoted 2 papers 11 months ago

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23, 2024 • 20

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39

liked 3 models 11 months ago

pyvene/reft_goody2

Updated Apr 15, 2024 • 16 • 2

pyvene/reft_chat7b_1k

Updated Apr 16, 2024 • 17 • 2

pyvene/reft_emoji_chat

Updated Apr 16, 2024 • 16 • 2

liked 3 Spaces 11 months ago

ReFT-Chat7B

Generate responses to chat messages using ReFT-Chat

ReFT-Emoji

Chat with an emoji-enhanced assistant

ReFT-Ethos

Converse with a helpful assistant in text form

upvoted a collection 11 months ago

Archangel

Archangel is a suite of human feedback-aligned LLMs, released as part of the Human-Aware Loss Functions (HALOs) project by Ethayarajh et al. (2024). • 77 items • Updated Feb 2, 2024 • 1

upvoted 2 papers 11 months ago

Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 31

Social Skill Training with Large Language Models

Paper • 2404.04204 • Published Apr 5, 2024 • 16

reacted to gsarti's post with ❤️ 11 months ago

Post

2162

🔍 Today's pick in Interpretability & Analysis of LMs: ReFT: Representation Finetuning for Language Models by @zhengxuanzenwu @aryaman Z. Wang @atticusg D. Jurafsky @manning @cgpotts

This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The 🤗-compatible pyreft library is introduced to simplify ReFT usage.

This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!

📄 Paper: ReFT: Representation Finetuning for Language Models (2404.03592)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9

updated a collection 11 months ago

Interpretability

3 items • Updated Apr 5, 2024 • 1

authored a paper 11 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 94