Colbert Mode Usage
#41
by
pulkitchahar
- opened
I wanted to store the colbert embeddings for faster reranking of retrieval based on dense vec. But considering that if a document have 1024 tokens on average(trunc if more), I will have 1024*1024 matrix, the size of which if i use fp16 will be 2MB. That sounds huge, especially when I think about scaling up. Am I doing this right, or am I missing something? Are there any ways to decrease the size but still keep the performance similar to original.
i'm also interested