MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 14 items • Updated May 21 • 11
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Oct 9 • 13