The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes
Abstract
Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25. However, no previous work investigated how <PRE_TAG>dense representations</POST_TAG> perform with large index sizes. We show theoretically and empirically that the performance for <PRE_TAG>dense representations</POST_TAG> decreases quicker than sparse representations for increasing index sizes. In extreme cases, this can even lead to a tipping point where at a certain index size sparse representations outperform <PRE_TAG>dense representations</POST_TAG>. We show that this behavior is tightly connected to the number of dimensions of the representations: The lower the dimension, the higher the chance for false positives, i.e. returning irrelevant documents.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper