SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published 9 days ago • 15
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Paper • 2304.05977 • Published Apr 12, 2023 • 1
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1
CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation Paper • 2311.18702 • Published Nov 30, 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models Paper • 2311.18743 • Published Nov 30, 2023 • 1
GLM: General Language Model Pretraining with Autoregressive Blank Infilling Paper • 2103.10360 • Published Mar 18, 2021 • 2
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks Paper • 2110.07602 • Published Oct 14, 2021
SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Paper • 2309.07045 • Published Sep 13, 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Paper • 2308.14508 • Published Aug 28, 2023 • 2
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Paper • 2402.14672 • Published Feb 22