B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 2 days ago • 31
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing Paper • 2305.11738 • Published May 19, 2023 • 8
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions and program synthesis • 231 items • Updated 4 days ago • 35
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 12 days ago • 74
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper • 2412.13018 • Published 8 days ago • 40
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models Paper • 2412.12606 • Published 8 days ago • 41
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published 10 days ago • 24
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published 9 days ago • 33
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 13 days ago • 90
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5 • 64
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper • 2410.23090 • Published Oct 30 • 54
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23 • 200
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13 • 54
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 51