Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 54
How Far Can We Go with Practical Function-Level Program Repair? Paper • 2404.12833 • Published Apr 19 • 6
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 68
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 119
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries Paper • 2406.12824 • Published Jun 18 • 20
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17 • 16
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published Jun 14 • 23
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published Jun 17 • 38
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published Jun 18 • 36
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21 • 62
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26 • 47
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Paper • 2406.18495 • Published Jun 26 • 12
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 95
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 129
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 54
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published Jul 9 • 29
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 124
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published Aug 27 • 24
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27 • 13
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 71
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10 • 55
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published Sep 10 • 37