splade / wrapup.md
Sean MacAvaney
update
68730a3
|
raw
history blame
1.88 kB

Putting it all together

When you use the document encoder in an indexing pipeline, the rewritting document contents are indexed:

D
SPLADE
D
Indexer
IDX
import pyterrer as pt
pt.init(version='snapshot')
import pyt_splade

dataset = pt.get_dataset('irds:msmarco-passage')
factory = pyt_splade.SpladeFactory()

indexer = pt.IterDictIndexer('./msmarco_psg', pretokenized=True)

indxer_pipe = factory.indexing() >> indexer
indxer_pipe.index(dataset.get_corpus_iter())

Once you built an index, you can build a retrieval pipeline that first encodes the query, and then performs retrieval:

Q
SPLADE
Q
TF Retriever
IDX
R
splade_retr = factory.query() >> pt.BatchRetrieve('./msmarco_psg', wmodel='Tf')

References & Credits

This package uses Naver's SPLADE repository.