Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published about 21 hours ago • 8
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published 4 days ago • 7
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published 4 days ago • 8
System Message Generation for User Preferences using Open-Source Models Paper • 2502.11330 • Published 5 days ago • 15
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Paper • 2502.11167 • Published 5 days ago • 11
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published 9 days ago • 27
Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Paper • 2502.12929 • Published 3 days ago • 6
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published 3 days ago • 12
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 5 days ago • 48
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 7 days ago • 49
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Paper • 2502.08590 • Published 9 days ago • 38
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 10 days ago • 42
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 10 days ago • 32
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper • 2502.01105 • Published 18 days ago • 18
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 11 days ago • 19
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 24 days ago • 22
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Paper • 2501.13554 • Published 29 days ago • 9