Spaces:
Running
Running
Update wrapup.md
Browse filesHi!
I just edited the wrap-up with two typo corrections. The typo in "pretokenized" is subtle since there is no argument validation in the IterDictIndexer constructor, and "tokenized" is the standard spelling. Even though it is a small typo, I think it should be corrected.
Thanks!
wrapup.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
### Putting it all together
|
2 |
|
3 |
-
When you use the document encoder in an indexing pipeline, the
|
4 |
|
5 |
<div class="pipeline">
|
6 |
<div class="df" title="Document Frame">D</div>
|
@@ -18,7 +18,7 @@ import pyt_splade
|
|
18 |
dataset = pt.get_dataset('irds:msmarco-passage')
|
19 |
splade = pyt_splade.SpladeFactory()
|
20 |
|
21 |
-
indexer = pt.IterDictIndexer('./msmarco_psg',
|
22 |
|
23 |
indxer_pipe = splade.indexing() >> indexer
|
24 |
indxer_pipe.index(dataset.get_corpus_iter())
|
|
|
1 |
### Putting it all together
|
2 |
|
3 |
+
When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:
|
4 |
|
5 |
<div class="pipeline">
|
6 |
<div class="df" title="Document Frame">D</div>
|
|
|
18 |
dataset = pt.get_dataset('irds:msmarco-passage')
|
19 |
splade = pyt_splade.SpladeFactory()
|
20 |
|
21 |
+
indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)
|
22 |
|
23 |
indxer_pipe = splade.indexing() >> indexer
|
24 |
indxer_pipe.index(dataset.get_corpus_iter())
|