Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Paper • 2412.03548 • Published 21 days ago • 16
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1 • 21
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Paper • 2406.16008 • Published Jun 23 • 6
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Paper • 2407.06723 • Published Jul 9 • 10
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps Paper • 2407.07071 • Published Jul 9 • 11
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Paper • 2406.16008 • Published Jun 23 • 6
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization Paper • 2406.16008 • Published Jun 23 • 6 • 1
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality Paper • 2306.14610 • Published Jun 26, 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes Paper • 2305.02301 • Published May 3, 2023 • 2
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models Paper • 2308.00675 • Published Aug 1, 2023 • 35
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 50