Using bge-m3 for clustering and search
#33
by
talavivi
- opened
Hi,
I have an idea to use the bge-m3 model for clustering and search. The main idea is that Each cluster will be represented by three centroids, one for each vector type (dense, sparse, and colbert). When searching for a cluster, calculate the score between the centroids and the 3 vector to identify the most suitable cluster.
For the dense vector I can simply calculate the mean of all the dense vectors. But I'm not sure what's the right approach for the sparse and colbert vectors.
Would love to hear your thoughts about it and If it's something passible..
The model will Sparse weight and colbert vector for each token. A possible method is maintaining a token set, and computing the mean value for each token.