Multilingual Large Language Models Are Not (Yet) Code-Switchers Paper • 2305.14235 • Published May 23, 2023
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis Paper • 2103.15348 • Published Mar 29, 2021
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 6
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32
MINERS: Multilingual Language Models as Semantic Retrievers Paper • 2406.07424 • Published Jun 11, 2024
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
Establishing Baselines for Text Classification in Low-Resource Languages Paper • 2005.02068 • Published May 5, 2020
Improving Large-scale Language Models and Resources for Filipino Paper • 2111.06053 • Published Nov 11, 2021
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling Paper • 2410.09223 • Published Oct 11, 2024 • 5
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection Paper • 2408.04284 • Published Aug 8, 2024 • 26
CrossNER: Evaluating Cross-Domain Named Entity Recognition Paper • 2012.04373 • Published Dec 8, 2020
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation Paper • 2112.06223 • Published Dec 12, 2021
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages Paper • 2205.15960 • Published May 31, 2022