Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published 27 days ago • 33
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Paper • 2505.14071 • Published May 20 • 1
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis Paper • 2312.13016 • Published Dec 20, 2023 • 6
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis Paper • 2503.15667 • Published Mar 19 • 8
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7, 2024 • 17
Pre-trained Large Language Models Use Fourier Features to Compute Addition Paper • 2406.03445 • Published Jun 5, 2024
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5, 2024 • 30
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models Paper • 2310.17086 • Published Oct 26, 2023 • 2
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations Paper • 2404.01266 • Published Apr 1, 2024 • 4
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback Paper • 2311.17946 • Published Nov 29, 2023 • 2
Simplicity Bias of Transformers to Learn Low Sensitivity Functions Paper • 2403.06925 • Published Mar 11, 2024 • 1
DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models Paper • 2402.02392 • Published Feb 4, 2024 • 6
FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs Paper • 2402.05904 • Published Feb 8, 2024