Papers
arxiv:2012.14210

The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes

Published on Dec 28, 2020
Authors:
,

Abstract

Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25. However, no previous work investigated how <PRE_TAG>dense representations</POST_TAG> perform with large index sizes. We show theoretically and empirically that the performance for <PRE_TAG>dense representations</POST_TAG> decreases quicker than sparse representations for increasing index sizes. In extreme cases, this can even lead to a tipping point where at a certain index size sparse representations outperform <PRE_TAG>dense representations</POST_TAG>. We show that this behavior is tightly connected to the number of dimensions of the representations: The lower the dimension, the higher the chance for false positives, i.e. returning irrelevant documents.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2012.14210 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2012.14210 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.