ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published 29 days ago • 82
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 21 days ago • 104
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 17 days ago • 62
Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published 20 days ago • 28
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 15 days ago • 51
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 13 days ago • 131
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75