Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Paper • 2503.05037 • Published 9 days ago • 4
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis Paper • 2502.11824 • Published 27 days ago
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu Paper • 2502.11862 • Published 27 days ago
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages Paper • 2410.23825 • Published Oct 31, 2024 • 4
LangSAMP: Language-Script Aware Multilingual Pretraining Paper • 2409.18199 • Published Sep 26, 2024 • 1