Submitted by ai-alanov 88 T-LoRA: Single Image Diffusion Model Customization Without Overfitting · 4 authors 49 1
Submitted by HaochenWang 40 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology · 12 authors 28 2
Submitted by ChaimZhu 32 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding · 7 authors 41 1
Submitted by js-hyun 29 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs · 9 authors 9 3
Submitted by Diankun 26 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling · 7 authors 2
Submitted by EthanTaylor 21 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS · 7 authors 1
Submitted by Franck-Dernoncourt 20 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality · 29 authors 1
Submitted by zhoutianyi 19 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs · 3 authors 5
Submitted by Xuandong 7 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models · 6 authors 2
Submitted by xianbao 3 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? · 11 authors 1
Submitted by dbralios 2 Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders · 3 authors 1
Submitted by Bochkov 2 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate · 1 authors 2
Submitted by Bochkov 1 Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations · 1 authors 1 1