Papers
arxiv:2402.12354

LoRA+: Efficient Low Rank Adaptation of Large Models

Published on Feb 19
Authors:
,

Abstract

In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension). This is due to the fact that adapter matrices A and B in LoRA are updated with the same learning rate. Using scaling arguments for large width networks, we demonstrate that using the same learning rate for A and B does not allow efficient feature learning. We then show that this suboptimality of LoRA can be corrected simply by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio. We call this proposed algorithm LoRA+. In our extensive experiments, LoRA+ improves performance (1-2 % improvements) and finetuning speed (up to sim 2X SpeedUp), at the same computational cost as LoRA.

Community

Cool idea!

Sign up or log in to comment

Models citing this paper 15

Browse 15 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.12354 in a dataset README.md to link it from this page.

Spaces citing this paper 6

Collections including this paper 2