Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 1 day ago • 18
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 6 days ago • 66
LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Paper • 2412.15188 • Published 5 days ago • 1
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 6 days ago • 51
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 7 days ago • 18
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published 7 days ago • 30
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers Paper • 2412.12276 • Published 8 days ago • 14
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 12 days ago • 74
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 11 days ago • 131
The Pitfalls of Memorization: When Memorization Hurts Generalization Paper • 2412.07684 • Published 15 days ago • 1
Large Concept Models: Language Modeling in a Sentence Representation Space Paper • 2412.08821 • Published 13 days ago • 7
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 12 days ago • 90
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published 15 days ago • 25
Elucidating the Design Space of Diffusion-Based Generative Models Paper • 2206.00364 • Published Jun 1, 2022 • 14