MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13, 2024 • 18
PromptBench: A Unified Library for Evaluation of Large Language Models Paper • 2312.07910 • Published Dec 13, 2023 • 15
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation Paper • 2312.07424 • Published Dec 12, 2023 • 7
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities Paper • 2307.16376 • Published Jul 31, 2023 • 2
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts Paper • 2306.04528 • Published Jun 7, 2023 • 3
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Paper • 2306.05087 • Published Jun 8, 2023 • 6