Deliberation in Latent Space via Differentiable Cache Augmentation Paper โข 2412.17747 โข Published Dec 23, 2024 โข 30
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper โข 2410.17856 โข Published Oct 23, 2024 โข 49
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement Paper โข 2410.15633 โข Published Oct 21, 2024 โข 7
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Paper โข 2410.01912 โข Published Oct 2, 2024 โข 14
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper โข 2407.05282 โข Published Jul 7, 2024 โข 13
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper โข 2407.00114 โข Published Jun 27, 2024 โข 12
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Paper โข 2407.00114 โข Published Jun 27, 2024 โข 12