中文Dense retrieval性能与BGE V1.5相比如何？

by TianyuLLM - opened Jan 31

Discussion

TianyuLLM

Jan 31

作者可否给出与BGE V1.5的性能对比？

Shitao

Beijing Academy of Artificial Intelligence org Jan 31

您好，我们暂时没有完善的对比结果。
不过鉴于BGE-M3是多语言模型，单个语言上只做向量检索的话可能并不会明显强于BGE v1.5。但BGE-M3的优势在于更通用（多语言、长文本），同时混合检索具有更好的准确度和泛化性，混合检索结果应该会强于BGE V1.5。
总之，建议根据实际需求选择模型。哪个在具体任务上好用哪个。

TianyuLLM

Feb 1

非常感谢作者的回答，还想请教个问题，用BGE-M3做长文档召回，召回后如何使用BGE-Reranker（max_length=512）重排呢？

Shitao

Beijing Academy of Artificial Intelligence org Feb 1

BGE-Reranker确实不支持太长的文本，我们后面会更新一版bge-reranker。
目前可以尝试直接用BGE-M3加权不同检索模式分数做重排，参考compute_score函数：https://huggingface.co/BAAI/bge-m3#compute-score-for-text-pairs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment