Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper โข 2412.04424 โข Published 20 days ago โข 55
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper โข 2409.01704 โข Published Sep 3 โข 83
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper โข 2408.16767 โข Published Aug 29 โข 30