Papers
arxiv:1911.05507

Compressive Transformers for Long-Range Sequence Modelling

Published on Nov 13, 2019
Authors:
,
,
,

Abstract

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary <PRE_TAG>language modelling benchmark</POST_TAG> derived from books, PG-19.

Community

Sign up or log in to comment

Models citing this paper 16

Browse 16 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 30

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.