Papers
arxiv:2205.10835

Multilingual Machine Translation with Hyper-Adapters

Published on May 22, 2022
Authors:
,
,

Abstract

Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks using hyper-<PRE_TAG>adapters</POST_TAG> -- hyper-networks that generate adapters from language and layer embeddings. While past work had poor results when scaling hyper-networks, we propose a rescaling fix that significantly improves convergence and enables training larger hyper-networks. We find that hyper-<PRE_TAG>adapters</POST_TAG> are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters. When using the same number of parameters and FLOPS, our approach consistently outperforms regular adapters. Also, hyper-<PRE_TAG>adapters</POST_TAG> converge faster than alternative approaches and scale better than regular dense networks. Our analysis shows that hyper-<PRE_TAG>adapters</POST_TAG> learn to encode language relatedness, enabling positive transfer across languages.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2205.10835 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2205.10835 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2205.10835 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.