This work introduces Representation fine-tuning (ReFT), a framework using learned inference-time interventions as efficient yet effective alternatives to PEFT weight adaptation. LoReFT, a ReFT variant intervening linearly on a representation subspaces, is evaluated against several PEFT approaches showing SOTA performances across popular benchmark with 10-50x speedup. The š¤-compatible pyreft library is introduced to simplify ReFT usage.
This is one of the most convincing practical applications of interpretability methods/insights I've seen in recent years, and I'm looking forward to people combining this with methods to disentangle features like SAEs and Backpack LMs for making interventions more interpretable!